Long range infrared videos such as the Defense Systems Information Analysis Center (DSIAC) videos usually
do not have high resolution. In recent years, there are significant advancement in video super-resolution
algorithms. Here, we summarize our study on the use of super-resolution videos for target detection and
classification. We observed that super-resolution videos can significantly improve the detection and
classification performance. For example, for 3000 m range videos, we were able to improve the average
precision of target detection from 11% (without super-resolution) to 44% (with 4x super-resolution) and the
overall accuracy of target classification from 10% (without super-resolution) to 44% (with 2x superresolution).
Multi Image Deblurring using Complementary Sets of Fluttering Patterns by Mul...IRJET Journal
This document discusses a proposed method for multi-image deblurring using complementary sets of fluttering patterns and an alternating direction multiplier method. Existing methods for coded exposure and multi-image deblurring have limitations like generating complex fluttering patterns, low signal-to-noise ratio, and loss of spectral information. The proposed method uses a multiplier algorithm to optimize a latent image and generate simple binary fluttering patterns for single or multiple input images. This helps reduce spectral loss and recover spatially consistent deblurred images with minimum noise. The method involves preprocessing the input image, setting regularization parameters, performing deconvolution iteratively using matrices, and outputting a deblurred image with sharp details and low noise.
This document compares the performance of image restoration techniques in the time and frequency domains. It proposes a new algorithm to denoise images corrupted by salt and pepper noise. The algorithm replaces noisy pixel values within a 3x3 window with a weighted median based on neighboring pixels. It applies filters like CLAHE, average, Wiener and median filtering before the proposed algorithm to further remove noise. Experimental results on test images show the proposed method achieves better noise removal compared to other techniques, with around a 60% increase in PSNR and 90% reduction in MSE. In conclusion, the proposed algorithm is effective at restoring images with high density salt and pepper noise.
Semantic Concept Detection in Video Using Hybrid Model of CNN and SVM Classif...CSCJournals
In today's era of digitization and fast internet, many video are uploaded on websites, a mechanism is required to access this video accurately and efficiently. Semantic concept detection achieve this task accurately and is used in many application like multimedia annotation, video summarization, annotation, indexing and retrieval. Video retrieval based on semantic concept is efficient and challenging research area. Semantic concept detection bridges the semantic gap between low level extraction of features from key-frame or shot of video and high level interpretation of the same as semantics. Semantic Concept detection automatically assigns labels to video from predefined vocabulary. This task is considered as supervised machine learning problem. Support vector machine (SVM) emerged as default classifier choice for this task. But recently Deep Convolutional Neural Network (CNN) has shown exceptional performance in this area. CNN requires large dataset for training. In this paper, we present framework for semantic concept detection using hybrid model of SVM and CNN. Global features like color moment, HSV histogram, wavelet transform, grey level co-occurrence matrix and edge orientation histogram are selected as low level features extracted from annotated groundtruth video dataset of TRECVID. In second pipeline, deep features are extracted using pretrained CNN. Dataset is partitioned in three segments to deal with data imbalance issue. Two classifiers are separately trained on all segments and fusion of scores is performed to detect the concepts in test dataset. The system performance is evaluated using Mean Average Precision for multi-label dataset. The performance of the proposed framework using hybrid model of SVM and CNN is comparable to existing approaches.
5 ijaems sept-2015-9-video feature extraction based on modified lle using ada...INFOGAIN PUBLICATION
Locally linear embedding (LLE) is an unsupervised learning algorithm which computes the low dimensional, neighborhood preserving embeddings of high dimensional data. LLE attempts to discover non-linear structure in high dimensional data by exploiting the local symmetries of linear reconstructions. In this paper, video feature extraction is done using modified LLE alongwith adaptive nearest neighbor approach to find the nearest neighbor and the connected components. The proposed feature extraction method is applied to a video. The video feature description gives a new tool for analysis of video.
International Journal of Engineering and Science Invention (IJESI) is an international journal intended for professionals and researchers in all fields of computer science and electronics. IJESI publishes research articles and reviews within the whole field Engineering Science and Technology, new teaching methods, assessment, validation and the impact of new technologies and it will continue to provide information on the latest trends and developments in this ever-expanding subject. The publications of papers are selected through double peer reviewed to ensure originality, relevance, and readability. The articles published in our journal can be accessed online.
Land Cover Feature Extraction using Hybrid Swarm Intelligence Techniques - A ...IDES Editor
This document presents a hybrid algorithm using biogeography-based optimization (BBO) and ant colony optimization (ACO) for land cover feature extraction from remote sensing images. The algorithm first analyzes a training image to identify features that BBO and ACO classify efficiently. It then applies BBO to clusters containing these features and ACO to remaining clusters. An evaluation shows the hybrid algorithm achieves a higher kappa coefficient of 0.97 compared to 0.67 for BBO alone, indicating better classification accuracy. The authors conclude the algorithm effectively handles uncertainties in remote sensing images and future work could improve efficiency further.
This doctoral dissertation examines facial skin motion properties from video for modeling and applications. It presents two methods for computing strain patterns from video: a finite difference method and a finite element method. The finite element method incorporates material properties of facial tissues by modeling their Young's modulus values. Experiments show strain patterns are discriminative and stable features for facial expression recognition, age estimation, and person identification. The dissertation also develops a method for expression invariant face matching by modeling Young's modulus from multiple expressions.
Our paper on homogeneous motion discovery oriented reference frame for high efficiency video coding talks about the idea of segmenting the current frame into cohesive motion regions made of blocks and then using these regions to form a motion compensated prediction. This prediction when used as an additional reference frame for the current frame, shows encouraging savings in bit rate over standalone HEVC reference coder.
Multi Image Deblurring using Complementary Sets of Fluttering Patterns by Mul...IRJET Journal
This document discusses a proposed method for multi-image deblurring using complementary sets of fluttering patterns and an alternating direction multiplier method. Existing methods for coded exposure and multi-image deblurring have limitations like generating complex fluttering patterns, low signal-to-noise ratio, and loss of spectral information. The proposed method uses a multiplier algorithm to optimize a latent image and generate simple binary fluttering patterns for single or multiple input images. This helps reduce spectral loss and recover spatially consistent deblurred images with minimum noise. The method involves preprocessing the input image, setting regularization parameters, performing deconvolution iteratively using matrices, and outputting a deblurred image with sharp details and low noise.
This document compares the performance of image restoration techniques in the time and frequency domains. It proposes a new algorithm to denoise images corrupted by salt and pepper noise. The algorithm replaces noisy pixel values within a 3x3 window with a weighted median based on neighboring pixels. It applies filters like CLAHE, average, Wiener and median filtering before the proposed algorithm to further remove noise. Experimental results on test images show the proposed method achieves better noise removal compared to other techniques, with around a 60% increase in PSNR and 90% reduction in MSE. In conclusion, the proposed algorithm is effective at restoring images with high density salt and pepper noise.
Semantic Concept Detection in Video Using Hybrid Model of CNN and SVM Classif...CSCJournals
In today's era of digitization and fast internet, many video are uploaded on websites, a mechanism is required to access this video accurately and efficiently. Semantic concept detection achieve this task accurately and is used in many application like multimedia annotation, video summarization, annotation, indexing and retrieval. Video retrieval based on semantic concept is efficient and challenging research area. Semantic concept detection bridges the semantic gap between low level extraction of features from key-frame or shot of video and high level interpretation of the same as semantics. Semantic Concept detection automatically assigns labels to video from predefined vocabulary. This task is considered as supervised machine learning problem. Support vector machine (SVM) emerged as default classifier choice for this task. But recently Deep Convolutional Neural Network (CNN) has shown exceptional performance in this area. CNN requires large dataset for training. In this paper, we present framework for semantic concept detection using hybrid model of SVM and CNN. Global features like color moment, HSV histogram, wavelet transform, grey level co-occurrence matrix and edge orientation histogram are selected as low level features extracted from annotated groundtruth video dataset of TRECVID. In second pipeline, deep features are extracted using pretrained CNN. Dataset is partitioned in three segments to deal with data imbalance issue. Two classifiers are separately trained on all segments and fusion of scores is performed to detect the concepts in test dataset. The system performance is evaluated using Mean Average Precision for multi-label dataset. The performance of the proposed framework using hybrid model of SVM and CNN is comparable to existing approaches.
5 ijaems sept-2015-9-video feature extraction based on modified lle using ada...INFOGAIN PUBLICATION
Locally linear embedding (LLE) is an unsupervised learning algorithm which computes the low dimensional, neighborhood preserving embeddings of high dimensional data. LLE attempts to discover non-linear structure in high dimensional data by exploiting the local symmetries of linear reconstructions. In this paper, video feature extraction is done using modified LLE alongwith adaptive nearest neighbor approach to find the nearest neighbor and the connected components. The proposed feature extraction method is applied to a video. The video feature description gives a new tool for analysis of video.
International Journal of Engineering and Science Invention (IJESI) is an international journal intended for professionals and researchers in all fields of computer science and electronics. IJESI publishes research articles and reviews within the whole field Engineering Science and Technology, new teaching methods, assessment, validation and the impact of new technologies and it will continue to provide information on the latest trends and developments in this ever-expanding subject. The publications of papers are selected through double peer reviewed to ensure originality, relevance, and readability. The articles published in our journal can be accessed online.
Land Cover Feature Extraction using Hybrid Swarm Intelligence Techniques - A ...IDES Editor
This document presents a hybrid algorithm using biogeography-based optimization (BBO) and ant colony optimization (ACO) for land cover feature extraction from remote sensing images. The algorithm first analyzes a training image to identify features that BBO and ACO classify efficiently. It then applies BBO to clusters containing these features and ACO to remaining clusters. An evaluation shows the hybrid algorithm achieves a higher kappa coefficient of 0.97 compared to 0.67 for BBO alone, indicating better classification accuracy. The authors conclude the algorithm effectively handles uncertainties in remote sensing images and future work could improve efficiency further.
This doctoral dissertation examines facial skin motion properties from video for modeling and applications. It presents two methods for computing strain patterns from video: a finite difference method and a finite element method. The finite element method incorporates material properties of facial tissues by modeling their Young's modulus values. Experiments show strain patterns are discriminative and stable features for facial expression recognition, age estimation, and person identification. The dissertation also develops a method for expression invariant face matching by modeling Young's modulus from multiple expressions.
Our paper on homogeneous motion discovery oriented reference frame for high efficiency video coding talks about the idea of segmenting the current frame into cohesive motion regions made of blocks and then using these regions to form a motion compensated prediction. This prediction when used as an additional reference frame for the current frame, shows encouraging savings in bit rate over standalone HEVC reference coder.
A DCT-BASED TOTAL JND PROFILE FORSPATIO-TEMPORAL AND FOVEATED MASKING EFFECTSNexgen Technology
TO GET THIS PROJECT COMPLETE SOURCE ON SUPPORT WITH EXECUTION PLEASE CALL BELOW CONTACT DETAILS
MOBILE: 9791938249, 0413-2211159, WEB: WWW.NEXGENPROJECT.COM,WWW.FINALYEAR-IEEEPROJECTS.COM, EMAIL:Praveen@nexgenproject.com
NEXGEN TECHNOLOGY provides total software solutions to its customers. Apsys works closely with the customers to identify their business processes for computerization and help them implement state-of-the-art solutions. By identifying and enhancing their processes through information technology solutions. NEXGEN TECHNOLOGY help it customers optimally use their resources.
PREVENTING COPYRIGHTS INFRINGEMENT OF IMAGES BY WATERMARKING IN TRANSFORM DOM...ijistjournal
1) The document discusses a method for preventing copyright infringement of images using watermarking in the transform domain and a full counter propagation neural network.
2) It aims to encode the host image before watermark embedding to enhance security. The fast and effective full counter propagation neural network then helps successfully embed the watermark without deteriorating the image quality.
3) Previous techniques embedded watermarks directly in images, but the authors find neural network synapses provide a better way to reduce distortion and increase message capacity when embedding watermarks.
In this paper a novel method for image enhancement
using PDTDFB (Pyramidal Dual-Tree Directional Filter
Bank) and interpolation has been adopted. Generally, in
digital images since the different kinds of noise highly affects
various image processing techniques it is always better to
perform denoising first. Here, first of all the image is
decomposed into two different layers namely low pass sub
band and high pass sub band after which denoising is being
performed on both the layers so as to smoothen the image.
The smoothened image is then interpolated using edgepreserving
interpolation and then amplified. Finally, the HR
(High Resolution) image is being obtained by performing
image composition.
here it introduces an efficient multi-resolution watermarking methodology for copyright protection of digital images. By adapting the watermark signal to the wavelet coefficients, the proposed method is highly image adaptive and the watermark signal can be strengthen in the most significant parts of the image. As this property also increases the watermark visibility, usage of the human visual system is incorporated to prevent perceptual visibility of embedded watermark signal. Experimental results show that the proposed system preserves the image quality and is vulnerable against most common image processing distortions. Furthermore, the hierarchical nature of wavelet transform allows for detection of watermark at various resolutions, resulting in reduction of the computational load needed for watermark detection based on the noise level. The performance of the proposed system is shown to be superior to that of other available schemes reported in the literature.
International Journal of Engineering Research and Development (IJERD)IJERD Editor
journal publishing, how to publish research paper, Call For research paper, international journal, publishing a paper, IJERD, journal of science and technology, how to get a research paper published, publishing a paper, publishing of journal, publishing of research paper, reserach and review articles, IJERD Journal, How to publish your research paper, publish research paper, open access engineering journal, Engineering journal, Mathemetics journal, Physics journal, Chemistry journal, Computer Engineering, Computer Science journal, how to submit your paper, peer reviw journal, indexed journal, reserach and review articles, engineering journal, www.ijerd.com, research journals,
yahoo journals, bing journals, International Journal of Engineering Research and Development, google journals, hard copy of journal
IRJET- Human Fall Detection using Co-Saliency-Enhanced Deep Recurrent Convolu...IRJET Journal
This document summarizes a research paper that proposes a new method for detecting human falls in videos using deep learning. The method uses a recurrent convolutional neural network (RCN) that applies convolutional neural networks (CNNs) to video segments and connects them with long short-term memory (LSTM) to model temporal relationships. It also enhances video frames using co-saliency detection to highlight important human activity regions before feeding them to the RCN. The researchers tested the method on a dataset of 768 video clips from 4 activity classes and achieved 98.12% accuracy at detecting falls, demonstrating the effectiveness of the co-saliency-enhanced RCN approach.
IRJET - Underwater Object Identification using Matlab and MachineIRJET Journal
This document discusses underwater object identification using MATLAB and machine learning. It begins with an abstract that outlines using image processing techniques like color correction and enhancement to improve underwater image quality and resolution for object detection. The methodology section then describes the process, which includes image acquisition, preprocessing like color conversion and noise removal, feature extraction to determine object type, and using a NodeMCU to send data to the cloud. It tests this approach by capturing images of fish underwater and classifying them by type. The results show enhanced, higher quality images compared to the originals. In conclusion, this method effectively removes color distortions and increases contrast to identify underwater objects using deep learning frameworks.
This document discusses objective video quality measurement based on the human visual system. It introduces various deblocking algorithms used to improve the quality of reconstructed video by reducing blocking artifacts. It also discusses limitations of traditional PSNR metrics and proposes a no-reference quality assessment method. The proposed method considers aspects of the human visual system like masking effects and uses algorithms in the DCT domain and post-processing to evaluate video quality in a way that correlates better with subjective human perception. Experimental results on distorted video sets demonstrate the effectiveness of the proposed no-reference quality measurement approach.
This document discusses image processing and summarizes several key techniques. It begins by defining image processing and describing how images are digitized and processed. It then summarizes three main categories of image processing: image enhancement, image restoration, and image compression. Specific techniques discussed include contrast stretching, density slicing, and edge enhancement. The document also discusses visual saliency models, motion saliency, and using conditional random fields for video object extraction.
Efficient video indexing for fast motion videoijcga
Due to advances in recent multimedia technologies, various digital video contents become available from different multimedia sources. Efficient management, storage, coding, and indexing of video are required because video contains lots of visual information and requires a large amount of memory. This paper proposes an efficient video indexing method for video with rapid motion or fast illumination change, in which motion information and feature points of specific objects are used. For accurate shot boundary detection, we make use of two steps: block matching algorithm to obtain accurate motion information and modified displaced frame difference to compensate for the error in existing methods. We also propose an object matching algorithm based on the scale invariant feature transform, which uses feature points to group shots semantically. Computer simulation with five fast-motion video shows the effectiveness of the proposed video indexing method.
Image Authentication Using Digital Watermarkingijceronline
International Journal of Computational Engineering Research (IJCER) is dedicated to protecting personal information and will make every reasonable effort to handle collected information appropriately. All information collected, as well as related requests, will be handled as carefully and efficiently as possible in accordance with IJCER standards for integrity and objectivity.
An Approach for Image Deblurring: Based on Sparse Representation and Regulari...IRJET Journal
This document presents an approach for image deblurring based on sparse representation and a regularized filter. The approach involves splitting the blurred input image into patches, estimating sparse coefficients for each patch, learning dictionaries from the coefficients, and merging the patches. The merged patches are subtracted from the blurred image to obtain the deblur kernel. Wiener deconvolution with the kernel is then applied and followed by a regularized filter to recover the original image without blurring. The approach was tested on MATLAB and evaluation metrics like RMSE, PSNR, and SSIM showed it performed better than existing methods, recovering images with more details and contrast.
The document proposes a selective data pruning-based compression scheme to improve rate-distortion performance. It involves pruning original frames to a smaller size before compression by dropping rows or columns. After decoding, frames are interpolated back to the original size using an edge-directed interpolation method. A novel high-order interpolation is also introduced to adapt to multiple edge directions. Simulation results validate the effectiveness of the proposed methods in image interpolation and video coding applications by achieving high quality from lower bitrates compared to existing techniques.
This document discusses a proposed approach for multi-focus image fusion using a discrete cosine wavelet sharpness criterion. Multi-focus image fusion combines information from multiple images of the same scene to produce an "all-in-focus" image. The proposed approach uses a discrete cosine transform to calculate sharpness values for sub-blocks of the input images and selects the sharpest sub-blocks to include in the fused image. Experimental results on images of a clock, bottle, and book show the discrete cosine wavelet criterion produces fused images with higher quality than a bilateral gradient-based sharpness criterion, as measured by mutual information metrics.
This document discusses a structural similarity based approach for efficient multi-view video coding. It begins with an introduction to multi-view video coding and the structural similarity index metric. It then proposes using structural similarity to exploit structural information between different video views. The method uses structural similarity for rate distortion optimization in encoding. Experimental results show the left and right views of a video, their structural similarity image, the decoded 3D video, and the achieved minimum distortion level. The document aims to improve multi-view video quality by using structural similarity during the encoding process.
IRJET- Exploring Image Super Resolution TechniquesIRJET Journal
This document discusses image super resolution techniques. It begins by defining super resolution as a technique that reconstructs a high resolution image from low resolution images. It then provides an overview of different super resolution methods including interpolation-based, reconstruction-based, and example-based (machine learning) techniques. The document evaluates state-of-the-art super resolution generative adversarial network (SRGAN) methods and their ability to generate realistic high resolution images from low resolution inputs. It also reviews the history and compares different super resolution techniques.
Survey on Image Integration of Misaligned ImagesIRJET Journal
The document discusses methods for integrating misaligned images to improve image quality under low lighting conditions. It reviews previous works that combine images like flash/no-flash pairs to transfer details and color, but have limitations when images are misaligned. The paper proposes a new method using a long-exposure image and flash image that introduces a local linear model to transfer color while maintaining natural colors and high contrast, without deteriorating contrast for misaligned pairs. It concludes that handling misaligned images remains a challenge with existing methods and further work is needed.
MULTIMODAL BIOMETRICS RECOGNITION FROM FACIAL VIDEO VIA DEEP LEARNINGcsandit
Biometrics identification using multiple modalities has attracted the attention of many researchers as it produces more robust and trustworthy results than single modality biometrics.
In this paper, we present a novel multimodal recognition system that trains a Deep Learning Network to automatically learn features after extracting multiple biometric modalities from a single data source, i.e., facial video clips. Utilizing different modalities, i.e., left ear, left profile face, frontal face, right profile face, and right ear, present in the facial video clips, we train supervised denosing autoencoders to automatically extract robust and non-redundant features.The automatically learned features are then used to train modality specific sparse classifiers to perform the multimodal recognition. Experiments conducted on the constrained facial video
dataset (WVU) and the unconstrained facial video dataset (HONDA/UCSD), resulted in a 99.17% and 97.14% rank-1 recognition rates, respectively. The multimodal recognition
accuracy demonstrates the superiority and robustness of the proposed approach irrespective of the illumination, non-planar movement, and pose variations present in the video clips.
Practical Approaches to Target Detection in Long Range and Low Quality Infrar...sipij
It is challenging to detect vehicles in long range and low quality infrared videos using deep learning
techniques such as You Only Look Once (YOLO) mainly due to small target size. This is because small
targets do not have detailed texture information. This paper focuses on practical approaches for target
detection in infrared videos using deep learning techniques. We first investigated a newer version of You
Only Look Once (YOLO v4). We then proposed a practical and effective approach by training the YOLO
model using videos from longer ranges. Experimental results using real infrared videos ranging from 1000
m to 3500 m demonstrated huge performance improvements. In particular, the average detection
percentage over the six ranges of 1000 m to 3500 m improved from 54% when we used the 1500 m videos
for training to 95% if we used the 3000 m videos for training.
Practical Approaches to Target Detection in Long Range and Low Quality Infrar...sipij
It is challenging to detect vehicles in long range and low quality infrared videos using deep learning
techniques such as You Only Look Once (YOLO) mainly due to small target size. This is because small
targets do not have detailed texture information. This paper focuses on practical approaches for target
detection in infrared videos using deep learning techniques. We first investigated a newer version of You
Only Look Once (YOLO v4). We then proposed a practical and effective approach by training the YOLO
model using videos from longer ranges. Experimental results using real infrared videos ranging from 1000
m to 3500 m demonstrated huge performance improvements. In particular, the average detection
percentage over the six ranges of 1000 m to 3500 m improved from 54% when we used the 1500 m videos
for training to 95% if we used the 3000 m videos for training.
Practical Approaches to Target Detection in Long Range and Low Quality Infrar...sipij
It is challenging to detect vehicles in long range and low quality infrared videos using deep learning
techniques such as You Only Look Once (YOLO) mainly due to small target size. This is because small
targets do not have detailed texture information. This paper focuses on practical approaches for target
detection in infrared videos using deep learning techniques. We first investigated a newer version of You
Only Look Once (YOLO v4). We then proposed a practical and effective approach by training the YOLO
model using videos from longer ranges. Experimental results using real infrared videos ranging from 1000
m to 3500 m demonstrated huge performance improvements. In particular, the average detection
percentage over the six ranges of 1000 m to 3500 m improved from 54% when we used the 1500 m videos
for training to 95% if we used the 3000 m videos for training.
PRACTICAL APPROACHES TO TARGET DETECTION IN LONG RANGE AND LOW QUALITY INFRAR...sipij
It is challenging to detect vehicles in long range and low quality infrared videos using deep learning
techniques such as You Only Look Once (YOLO) mainly due to small target size. This is because small
targets do not have detailed texture information. This paper focuses on practical approaches for target
detection in infrared videos using deep learning techniques. We first investigated a newer version of You
Only Look Once (YOLO v4). We then proposed a practical and effective approach by training the YOLO
model using videos from longer ranges. Experimental results using real infrared videos ranging from 1000
m to 3500 m demonstrated huge performance improvements. In particular, the average detection
percentage over the six ranges of 1000 m to 3500 m improved from 54% when we used the 1500 m videos
for training to 95% if we used the 3000 m videos for training.
A DCT-BASED TOTAL JND PROFILE FORSPATIO-TEMPORAL AND FOVEATED MASKING EFFECTSNexgen Technology
TO GET THIS PROJECT COMPLETE SOURCE ON SUPPORT WITH EXECUTION PLEASE CALL BELOW CONTACT DETAILS
MOBILE: 9791938249, 0413-2211159, WEB: WWW.NEXGENPROJECT.COM,WWW.FINALYEAR-IEEEPROJECTS.COM, EMAIL:Praveen@nexgenproject.com
NEXGEN TECHNOLOGY provides total software solutions to its customers. Apsys works closely with the customers to identify their business processes for computerization and help them implement state-of-the-art solutions. By identifying and enhancing their processes through information technology solutions. NEXGEN TECHNOLOGY help it customers optimally use their resources.
PREVENTING COPYRIGHTS INFRINGEMENT OF IMAGES BY WATERMARKING IN TRANSFORM DOM...ijistjournal
1) The document discusses a method for preventing copyright infringement of images using watermarking in the transform domain and a full counter propagation neural network.
2) It aims to encode the host image before watermark embedding to enhance security. The fast and effective full counter propagation neural network then helps successfully embed the watermark without deteriorating the image quality.
3) Previous techniques embedded watermarks directly in images, but the authors find neural network synapses provide a better way to reduce distortion and increase message capacity when embedding watermarks.
In this paper a novel method for image enhancement
using PDTDFB (Pyramidal Dual-Tree Directional Filter
Bank) and interpolation has been adopted. Generally, in
digital images since the different kinds of noise highly affects
various image processing techniques it is always better to
perform denoising first. Here, first of all the image is
decomposed into two different layers namely low pass sub
band and high pass sub band after which denoising is being
performed on both the layers so as to smoothen the image.
The smoothened image is then interpolated using edgepreserving
interpolation and then amplified. Finally, the HR
(High Resolution) image is being obtained by performing
image composition.
here it introduces an efficient multi-resolution watermarking methodology for copyright protection of digital images. By adapting the watermark signal to the wavelet coefficients, the proposed method is highly image adaptive and the watermark signal can be strengthen in the most significant parts of the image. As this property also increases the watermark visibility, usage of the human visual system is incorporated to prevent perceptual visibility of embedded watermark signal. Experimental results show that the proposed system preserves the image quality and is vulnerable against most common image processing distortions. Furthermore, the hierarchical nature of wavelet transform allows for detection of watermark at various resolutions, resulting in reduction of the computational load needed for watermark detection based on the noise level. The performance of the proposed system is shown to be superior to that of other available schemes reported in the literature.
International Journal of Engineering Research and Development (IJERD)IJERD Editor
journal publishing, how to publish research paper, Call For research paper, international journal, publishing a paper, IJERD, journal of science and technology, how to get a research paper published, publishing a paper, publishing of journal, publishing of research paper, reserach and review articles, IJERD Journal, How to publish your research paper, publish research paper, open access engineering journal, Engineering journal, Mathemetics journal, Physics journal, Chemistry journal, Computer Engineering, Computer Science journal, how to submit your paper, peer reviw journal, indexed journal, reserach and review articles, engineering journal, www.ijerd.com, research journals,
yahoo journals, bing journals, International Journal of Engineering Research and Development, google journals, hard copy of journal
IRJET- Human Fall Detection using Co-Saliency-Enhanced Deep Recurrent Convolu...IRJET Journal
This document summarizes a research paper that proposes a new method for detecting human falls in videos using deep learning. The method uses a recurrent convolutional neural network (RCN) that applies convolutional neural networks (CNNs) to video segments and connects them with long short-term memory (LSTM) to model temporal relationships. It also enhances video frames using co-saliency detection to highlight important human activity regions before feeding them to the RCN. The researchers tested the method on a dataset of 768 video clips from 4 activity classes and achieved 98.12% accuracy at detecting falls, demonstrating the effectiveness of the co-saliency-enhanced RCN approach.
IRJET - Underwater Object Identification using Matlab and MachineIRJET Journal
This document discusses underwater object identification using MATLAB and machine learning. It begins with an abstract that outlines using image processing techniques like color correction and enhancement to improve underwater image quality and resolution for object detection. The methodology section then describes the process, which includes image acquisition, preprocessing like color conversion and noise removal, feature extraction to determine object type, and using a NodeMCU to send data to the cloud. It tests this approach by capturing images of fish underwater and classifying them by type. The results show enhanced, higher quality images compared to the originals. In conclusion, this method effectively removes color distortions and increases contrast to identify underwater objects using deep learning frameworks.
This document discusses objective video quality measurement based on the human visual system. It introduces various deblocking algorithms used to improve the quality of reconstructed video by reducing blocking artifacts. It also discusses limitations of traditional PSNR metrics and proposes a no-reference quality assessment method. The proposed method considers aspects of the human visual system like masking effects and uses algorithms in the DCT domain and post-processing to evaluate video quality in a way that correlates better with subjective human perception. Experimental results on distorted video sets demonstrate the effectiveness of the proposed no-reference quality measurement approach.
This document discusses image processing and summarizes several key techniques. It begins by defining image processing and describing how images are digitized and processed. It then summarizes three main categories of image processing: image enhancement, image restoration, and image compression. Specific techniques discussed include contrast stretching, density slicing, and edge enhancement. The document also discusses visual saliency models, motion saliency, and using conditional random fields for video object extraction.
Efficient video indexing for fast motion videoijcga
Due to advances in recent multimedia technologies, various digital video contents become available from different multimedia sources. Efficient management, storage, coding, and indexing of video are required because video contains lots of visual information and requires a large amount of memory. This paper proposes an efficient video indexing method for video with rapid motion or fast illumination change, in which motion information and feature points of specific objects are used. For accurate shot boundary detection, we make use of two steps: block matching algorithm to obtain accurate motion information and modified displaced frame difference to compensate for the error in existing methods. We also propose an object matching algorithm based on the scale invariant feature transform, which uses feature points to group shots semantically. Computer simulation with five fast-motion video shows the effectiveness of the proposed video indexing method.
Image Authentication Using Digital Watermarkingijceronline
International Journal of Computational Engineering Research (IJCER) is dedicated to protecting personal information and will make every reasonable effort to handle collected information appropriately. All information collected, as well as related requests, will be handled as carefully and efficiently as possible in accordance with IJCER standards for integrity and objectivity.
An Approach for Image Deblurring: Based on Sparse Representation and Regulari...IRJET Journal
This document presents an approach for image deblurring based on sparse representation and a regularized filter. The approach involves splitting the blurred input image into patches, estimating sparse coefficients for each patch, learning dictionaries from the coefficients, and merging the patches. The merged patches are subtracted from the blurred image to obtain the deblur kernel. Wiener deconvolution with the kernel is then applied and followed by a regularized filter to recover the original image without blurring. The approach was tested on MATLAB and evaluation metrics like RMSE, PSNR, and SSIM showed it performed better than existing methods, recovering images with more details and contrast.
The document proposes a selective data pruning-based compression scheme to improve rate-distortion performance. It involves pruning original frames to a smaller size before compression by dropping rows or columns. After decoding, frames are interpolated back to the original size using an edge-directed interpolation method. A novel high-order interpolation is also introduced to adapt to multiple edge directions. Simulation results validate the effectiveness of the proposed methods in image interpolation and video coding applications by achieving high quality from lower bitrates compared to existing techniques.
This document discusses a proposed approach for multi-focus image fusion using a discrete cosine wavelet sharpness criterion. Multi-focus image fusion combines information from multiple images of the same scene to produce an "all-in-focus" image. The proposed approach uses a discrete cosine transform to calculate sharpness values for sub-blocks of the input images and selects the sharpest sub-blocks to include in the fused image. Experimental results on images of a clock, bottle, and book show the discrete cosine wavelet criterion produces fused images with higher quality than a bilateral gradient-based sharpness criterion, as measured by mutual information metrics.
This document discusses a structural similarity based approach for efficient multi-view video coding. It begins with an introduction to multi-view video coding and the structural similarity index metric. It then proposes using structural similarity to exploit structural information between different video views. The method uses structural similarity for rate distortion optimization in encoding. Experimental results show the left and right views of a video, their structural similarity image, the decoded 3D video, and the achieved minimum distortion level. The document aims to improve multi-view video quality by using structural similarity during the encoding process.
IRJET- Exploring Image Super Resolution TechniquesIRJET Journal
This document discusses image super resolution techniques. It begins by defining super resolution as a technique that reconstructs a high resolution image from low resolution images. It then provides an overview of different super resolution methods including interpolation-based, reconstruction-based, and example-based (machine learning) techniques. The document evaluates state-of-the-art super resolution generative adversarial network (SRGAN) methods and their ability to generate realistic high resolution images from low resolution inputs. It also reviews the history and compares different super resolution techniques.
Survey on Image Integration of Misaligned ImagesIRJET Journal
The document discusses methods for integrating misaligned images to improve image quality under low lighting conditions. It reviews previous works that combine images like flash/no-flash pairs to transfer details and color, but have limitations when images are misaligned. The paper proposes a new method using a long-exposure image and flash image that introduces a local linear model to transfer color while maintaining natural colors and high contrast, without deteriorating contrast for misaligned pairs. It concludes that handling misaligned images remains a challenge with existing methods and further work is needed.
MULTIMODAL BIOMETRICS RECOGNITION FROM FACIAL VIDEO VIA DEEP LEARNINGcsandit
Biometrics identification using multiple modalities has attracted the attention of many researchers as it produces more robust and trustworthy results than single modality biometrics.
In this paper, we present a novel multimodal recognition system that trains a Deep Learning Network to automatically learn features after extracting multiple biometric modalities from a single data source, i.e., facial video clips. Utilizing different modalities, i.e., left ear, left profile face, frontal face, right profile face, and right ear, present in the facial video clips, we train supervised denosing autoencoders to automatically extract robust and non-redundant features.The automatically learned features are then used to train modality specific sparse classifiers to perform the multimodal recognition. Experiments conducted on the constrained facial video
dataset (WVU) and the unconstrained facial video dataset (HONDA/UCSD), resulted in a 99.17% and 97.14% rank-1 recognition rates, respectively. The multimodal recognition
accuracy demonstrates the superiority and robustness of the proposed approach irrespective of the illumination, non-planar movement, and pose variations present in the video clips.
Practical Approaches to Target Detection in Long Range and Low Quality Infrar...sipij
It is challenging to detect vehicles in long range and low quality infrared videos using deep learning
techniques such as You Only Look Once (YOLO) mainly due to small target size. This is because small
targets do not have detailed texture information. This paper focuses on practical approaches for target
detection in infrared videos using deep learning techniques. We first investigated a newer version of You
Only Look Once (YOLO v4). We then proposed a practical and effective approach by training the YOLO
model using videos from longer ranges. Experimental results using real infrared videos ranging from 1000
m to 3500 m demonstrated huge performance improvements. In particular, the average detection
percentage over the six ranges of 1000 m to 3500 m improved from 54% when we used the 1500 m videos
for training to 95% if we used the 3000 m videos for training.
Practical Approaches to Target Detection in Long Range and Low Quality Infrar...sipij
It is challenging to detect vehicles in long range and low quality infrared videos using deep learning
techniques such as You Only Look Once (YOLO) mainly due to small target size. This is because small
targets do not have detailed texture information. This paper focuses on practical approaches for target
detection in infrared videos using deep learning techniques. We first investigated a newer version of You
Only Look Once (YOLO v4). We then proposed a practical and effective approach by training the YOLO
model using videos from longer ranges. Experimental results using real infrared videos ranging from 1000
m to 3500 m demonstrated huge performance improvements. In particular, the average detection
percentage over the six ranges of 1000 m to 3500 m improved from 54% when we used the 1500 m videos
for training to 95% if we used the 3000 m videos for training.
Practical Approaches to Target Detection in Long Range and Low Quality Infrar...sipij
It is challenging to detect vehicles in long range and low quality infrared videos using deep learning
techniques such as You Only Look Once (YOLO) mainly due to small target size. This is because small
targets do not have detailed texture information. This paper focuses on practical approaches for target
detection in infrared videos using deep learning techniques. We first investigated a newer version of You
Only Look Once (YOLO v4). We then proposed a practical and effective approach by training the YOLO
model using videos from longer ranges. Experimental results using real infrared videos ranging from 1000
m to 3500 m demonstrated huge performance improvements. In particular, the average detection
percentage over the six ranges of 1000 m to 3500 m improved from 54% when we used the 1500 m videos
for training to 95% if we used the 3000 m videos for training.
PRACTICAL APPROACHES TO TARGET DETECTION IN LONG RANGE AND LOW QUALITY INFRAR...sipij
It is challenging to detect vehicles in long range and low quality infrared videos using deep learning
techniques such as You Only Look Once (YOLO) mainly due to small target size. This is because small
targets do not have detailed texture information. This paper focuses on practical approaches for target
detection in infrared videos using deep learning techniques. We first investigated a newer version of You
Only Look Once (YOLO v4). We then proposed a practical and effective approach by training the YOLO
model using videos from longer ranges. Experimental results using real infrared videos ranging from 1000
m to 3500 m demonstrated huge performance improvements. In particular, the average detection
percentage over the six ranges of 1000 m to 3500 m improved from 54% when we used the 1500 m videos
for training to 95% if we used the 3000 m videos for training.
06 13sept 8313 9997-2-ed an adaptive (edit lafi)IAESIJEECS
A robust Adaptive Reconstruction Error Minimization Convolution Neural Network (ARemCNN) architecture introduced to provide high reconstruction quality from low resolution using parallel configuration. Our proposed model can easily train the bulky datasets such as YUV21 and Videoset4.Our experimental results shows that our model outperforms many existing techniques in terms of PSNR, SSIM and reconstruction quality. The experimental results shows that our average PSNR result is 39.81 considering upscale-2, 35.56 for upscale-3 and 33.77 for upscale-4 for Videoset4 dataset which is very high in contrast to other existing techniques. Similarly, the experimental results shows that our average PSNR result is 38.71 considering upscale-2, 34.58 for upscale-3 and 33.047 for upscale-4 for YUV21 dataset.
Target Detection and Classification Improvements using Contrast Enhanced 16-b...sipij
In our earlier target detection and classification papers, we used 8-bit infrared videos in the Defense
Systems Information Analysis Center(DSIAC) video dataset. In this paper, we focus on how we can
improve the target detection and classification results using 16-bit videos. One problem with the 16-bit
videos is that some image frames have very low contrast. Two methods were explored to improve upon
previous detection and classification results. The first method used to improve contrast was effectively the
same as the baseline 8-bit video data but using the 16-bit raw data rather than the 8-bit data taken from
the avi files. The second method used was a second order histogram matching algorithm that preserves the
16-bit nature of the videos while providing normalization and contrast enhancement. Results showed the
second order histogram matching algorithm improved the target detection using You Only Look Once
(YOLO) and classificationusing Residual Network (ResNet) performance. The average precision (AP)
metric in YOLO was improved by 8%. This is quite significant. The overall accuracy (OA) of ResNet has
been improved by 12%. This is also very significant.
TARGET DETECTION AND CLASSIFICATION IMPROVEMENTS USING CONTRAST ENHANCED 16-B...sipij
This document describes research on improving target detection and classification in infrared videos using 16-bit data and contrast enhancement techniques. Two contrast enhancement methods are explored: 1) histogram matching 16-bit videos to an 8-bit reference frame, and 2) a second order histogram matching algorithm that preserves the 16-bit nature of videos while enhancing contrast. Experimental results showed that the second method improved target detection performance using YOLO and classification performance using ResNet compared to previous 8-bit results and the first histogram matching method.
VIDEO BASED SIGN LANGUAGE RECOGNITION USING CNN-LSTMIRJET Journal
This document presents a proposed method for video-based sign language recognition using convolutional neural networks (CNN) and long short-term memory (LSTM). The method uses CNN to extract spatial features from video frames of sign language and LSTM to analyze the temporal characteristics of the frames to recognize the sign. Color segmentation is used to isolate the hands from video frames by detecting colored gloves worn by the signer. CNN is trained on spatial features from frames to classify signs, and LSTM is used to analyze the sequential features from CNN to recognize signs in full videos. The proposed method achieved 94% accuracy on sign recognition in testing.
Real Time Sign Language Recognition Using Deep LearningIRJET Journal
The document describes a study that used the YOLOv5 deep learning model to perform real-time sign language recognition. The researchers trained and tested the model on the Roboflow dataset along with additional images. They achieved 88.4% accuracy, 76.6% precision, and 81.2% recall. For comparison, they also trained a CNN model which achieved lower accuracy of 52.98%. The YOLOv5 model was able to detect signs in complex environments and perform accurate real-time detection, demonstrating its advantages over CNN for this task.
Image super resolution using Generative Adversarial Network.IRJET Journal
This document discusses using a generative adversarial network (GAN) for image super resolution. It begins with an abstract that explains super resolution aims to increase image resolution by adding sub-pixel detail. Convolutional neural networks are well-suited for this task. Recent years have seen interest in reconstructing super resolution video sequences from low resolution images. The document then reviews literature on image super resolution techniques including deep learning methods. It describes the methodology which uses a CNN to compare input images to a trained dataset to predict if high-resolution images can be generated from low-resolution images.
IRJET - Applications of Image and Video Deduplication: A SurveyIRJET Journal
This document discusses applications of image and video deduplication techniques. It begins by providing background on the growth of multimedia data and need for deduplication to reduce redundant data. It then describes key aspects of image and video deduplication, including extracting fingerprints from images and frames to identify duplicates. The document reviews several studies on image and video deduplication applications, such as identifying near-duplicate images on social media, detecting spoofed face images, verifying image copy detection, and eliminating near-duplicates from visual sensor networks. Overall, the document surveys various real-world implementations of image and video deduplication.
Enhance Example-Based Super Resolution to Achieve Fine Magnification of Low ...IJMER
This paper proposes an enhanced example-based super resolution method to achieve fine magnification of low resolution images using a neighbor embedding method. The proposed method has two phases: 1) A dictionary construction phase that extracts patch pairs from high and low resolution images and stores them in a dictionary. K-means trees are used to organize the patches. 2) A super resolution phase that searches the dictionary using neighbor embedding to find the best matching high resolution patch to synthesize the output image. Non-local mean filtering is then applied to remove blurring. Experimental results on text images show the proposed method improves image quality over other techniques by reducing blurring and artifacts.
This document presents a novel approach for jointly optimizing spatial prediction and transform coding in video compression. It aims to improve performance and reduce complexity compared to existing techniques. The proposed method uses singular value decomposition (SVD) to compress images. SVD decomposes an image matrix into three matrices, allowing the image to be approximated using only a few singular values. This achieves compression by removing redundant information. The document outlines the SVD approach for image compression and measures compression performance using compression ratio and mean squared error between the original and compressed images. It then discusses trends in image and video coding, including combining natural and synthetic content. Finally, it provides a block diagram of the proposed system and compares its compression performance to existing discrete cosine transform-
Large-scale Video Classification with Convolutional Neural Net.docxcroysierkathey
Large-scale Video Classification with Convolutional Neural Networks
Andrej Karpathy1,2 George Toderici1 Sanketh Shetty1
[email protected][email protected][email protected]
Thomas Leung1 Rahul Sukthankar1 Li Fei-Fei2
[email protected][email protected][email protected]
1Google Research 2Computer Science Department, Stanford University
http://cs.stanford.edu/people/karpathy/deepvideo
Abstract
Convolutional Neural Networks (CNNs) have been es-
tablished as a powerful class of models for image recog-
nition problems. Encouraged by these results, we pro-
vide an extensive empirical evaluation of CNNs on large-
scale video classification using a new dataset of 1 million
YouTube videos belonging to 487 classes. We study mul-
tiple approaches for extending the connectivity of a CNN
in time domain to take advantage of local spatio-temporal
information and suggest a multiresolution, foveated archi-
tecture as a promising way of speeding up the training.
Our best spatio-temporal networks display significant per-
formance improvements compared to strong feature-based
baselines (55.3% to 63.9%), but only a surprisingly mod-
est improvement compared to single-frame models (59.3%
to 60.9%). We further study the generalization performance
of our best model by retraining the top layers on the UCF-
101 Action Recognition dataset and observe significant per-
formance improvements compared to the UCF-101 baseline
model (63.3% up from 43.9%).
1. Introduction
Images and videos have become ubiquitous on the in-
ternet, which has encouraged the development of algo-
rithms that can analyze their semantic content for vari-
ous applications, including search and summarization. Re-
cently, Convolutional Neural Networks (CNNs) [15] have
been demonstrated as an effective class of models for un-
derstanding image content, giving state-of-the-art results
on image recognition, segmentation, detection and retrieval
[11, 3, 2, 20, 9, 18]. The key enabling factors behind these
results were techniques for scaling up the networks to tens
of millions of parameters and massive labeled datasets that
can support the learning process. Under these conditions,
CNNs have been shown to learn powerful and interpretable
image features [28]. Encouraged by positive results in do-
main of images, we study the performance of CNNs in
large-scale video classification, where the networks have
access to not only the appearance information present in
single, static images, but also their complex temporal evolu-
tion. There are several challenges to extending and applying
CNNs in this setting.
From a practical standpoint, there are currently no video
classification benchmarks that match the scale and variety
of existing image datasets because videos are significantly
more difficult to collect, annotate and store. To obtain suffi-
cient amount of data needed to train our CNN architectures,
we collected a new Sports-1M dataset, which consists of 1
million YouTube videos belonging to a taxonomy ...
The document proposes two video quality assessment models - a full-reference model that measures structural distortion compared to traditional error-based methods, and a no-reference model for compressed MPEG video. Experimental results on standard test datasets show the full-reference model has higher correlation with subjective quality ratings than previous methods. Preliminary results also show the no-reference model correlates well with the full-reference model for MPEG videos at different bitrates. The models analyze factors like quantization errors, blocking effects, and motion to evaluate video quality.
Video Denoising using Transform Domain MethodIRJET Journal
This document presents a proposed method for video denoising using dictionary learning and transform domain techniques. It begins with an abstract describing how traditional video denoising models based on Gaussian noise do not account for real-world noise sources. The proposed method then learns basis functions adaptively from input video frames using dictionary learning, providing a sparse representation. Hard thresholding is applied in the transform domain to compute denoised frames. Experimental results on standard test videos show the method achieves competitive performance compared to other approaches in terms of peak signal-to-noise ratio.
SIGN LANGUAGE INTERFACE SYSTEM FOR HEARING IMPAIRED PEOPLEIRJET Journal
The document describes a proposed sign language interface system for hearing impaired people. The system aims to use machine learning algorithms like convolutional neural networks to classify hand gestures captured by a webcam into corresponding letters or words. The system would preprocess the images, extract features, then use a trained CNN model to predict the sign and output it as text and speech for better understanding by users. The goal is to help bridge communication between deaf/mute and normal people without requiring specialized gloves or sensors.
International Journal of Engineering Research and Development (IJERD)IJERD Editor
journal publishing, how to publish research paper, Call For research paper, international journal, publishing a paper, IJERD, journal of science and technology, how to get a research paper published, publishing a paper, publishing of journal, publishing of research paper, reserach and review articles, IJERD Journal, How to publish your research paper, publish research paper, open access engineering journal, Engineering journal, Mathemetics journal, Physics journal, Chemistry journal, Computer Engineering, Computer Science journal, how to submit your paper, peer reviw journal, indexed journal, reserach and review articles, engineering journal, www.ijerd.com, research journals,
yahoo journals, bing journals, International Journal of Engineering Research and Development, google journals, hard copy of journal
Scene recognition using Convolutional Neural NetworkDhirajGidde
The document discusses scene recognition using convolutional neural networks. It begins with an abstract stating that scene recognition allows context for object recognition. While object recognition has improved due to large datasets and CNNs, scene recognition performance has not reached the same level of success. The document then discusses using a new scene-centric database called Places with over 7 million images to train CNNs for scene recognition. It establishes new state-of-the-art results on several scene datasets and allows visualization of network responses to show differences between object-centric and scene-centric representations.
A survey on Measurement of Objective Video Quality in Social Cloud using Mach...IRJET Journal
This document discusses using machine learning techniques for objective measurement of video quality in social cloud applications. It proposes a method using convolutional neural networks (CNNs) trained on a large dataset of videos evaluated by human observers to predict video quality. The CNN model extracts quality-related features from videos which are used to train a machine learning model. The model achieved high accuracy in predicting subjective quality outcomes for new videos. Machine learning provides an objective way to assess video quality that can benefit users and service providers by optimizing video streaming quality. The document reviews key concepts like video quality assessment, deep learning and CNNs, and the typical methodology used in machine learning-based video quality measurement.
Similar to TARGET DETECTION AND CLASSIFICATION PERFORMANCE ENHANCEMENT USING SUPERRESOLUTION INFRARED VIDEOS (20)
Understanding Inductive Bias in Machine LearningSUTEJAS
This presentation explores the concept of inductive bias in machine learning. It explains how algorithms come with built-in assumptions and preferences that guide the learning process. You'll learn about the different types of inductive bias and how they can impact the performance and generalizability of machine learning models.
The presentation also covers the positive and negative aspects of inductive bias, along with strategies for mitigating potential drawbacks. We'll explore examples of how bias manifests in algorithms like neural networks and decision trees.
By understanding inductive bias, you can gain valuable insights into how machine learning models work and make informed decisions when building and deploying them.
Embedded machine learning-based road conditions and driving behavior monitoringIJECEIAES
Car accident rates have increased in recent years, resulting in losses in human lives, properties, and other financial costs. An embedded machine learning-based system is developed to address this critical issue. The system can monitor road conditions, detect driving patterns, and identify aggressive driving behaviors. The system is based on neural networks trained on a comprehensive dataset of driving events, driving styles, and road conditions. The system effectively detects potential risks and helps mitigate the frequency and impact of accidents. The primary goal is to ensure the safety of drivers and vehicles. Collecting data involved gathering information on three key road events: normal street and normal drive, speed bumps, circular yellow speed bumps, and three aggressive driving actions: sudden start, sudden stop, and sudden entry. The gathered data is processed and analyzed using a machine learning system designed for limited power and memory devices. The developed system resulted in 91.9% accuracy, 93.6% precision, and 92% recall. The achieved inference time on an Arduino Nano 33 BLE Sense with a 32-bit CPU running at 64 MHz is 34 ms and requires 2.6 kB peak RAM and 139.9 kB program flash memory, making it suitable for resource-constrained embedded systems.
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...IJECEIAES
Medical image analysis has witnessed significant advancements with deep learning techniques. In the domain of brain tumor segmentation, the ability to
precisely delineate tumor boundaries from magnetic resonance imaging (MRI)
scans holds profound implications for diagnosis. This study presents an ensemble convolutional neural network (CNN) with transfer learning, integrating
the state-of-the-art Deeplabv3+ architecture with the ResNet18 backbone. The
model is rigorously trained and evaluated, exhibiting remarkable performance
metrics, including an impressive global accuracy of 99.286%, a high-class accuracy of 82.191%, a mean intersection over union (IoU) of 79.900%, a weighted
IoU of 98.620%, and a Boundary F1 (BF) score of 83.303%. Notably, a detailed comparative analysis with existing methods showcases the superiority of
our proposed model. These findings underscore the model’s competence in precise brain tumor localization, underscoring its potential to revolutionize medical
image analysis and enhance healthcare outcomes. This research paves the way
for future exploration and optimization of advanced CNN models in medical
imaging, emphasizing addressing false positives and resource efficiency.
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressionsVictor Morales
K8sGPT is a tool that analyzes and diagnoses Kubernetes clusters. This presentation was used to share the requirements and dependencies to deploy K8sGPT in a local environment.
Introduction- e - waste – definition - sources of e-waste– hazardous substances in e-waste - effects of e-waste on environment and human health- need for e-waste management– e-waste handling rules - waste minimization techniques for managing e-waste – recycling of e-waste - disposal treatment methods of e- waste – mechanism of extraction of precious metal from leaching solution-global Scenario of E-waste – E-waste in India- case studies.
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024Sinan KOZAK
Sinan from the Delivery Hero mobile infrastructure engineering team shares a deep dive into performance acceleration with Gradle build cache optimizations. Sinan shares their journey into solving complex build-cache problems that affect Gradle builds. By understanding the challenges and solutions found in our journey, we aim to demonstrate the possibilities for faster builds. The case study reveals how overlapping outputs and cache misconfigurations led to significant increases in build times, especially as the project scaled up with numerous modules using Paparazzi tests. The journey from diagnosing to defeating cache issues offers invaluable lessons on maintaining cache integrity without sacrificing functionality.
Comparative analysis between traditional aquaponics and reconstructed aquapon...bijceesjournal
The aquaponic system of planting is a method that does not require soil usage. It is a method that only needs water, fish, lava rocks (a substitute for soil), and plants. Aquaponic systems are sustainable and environmentally friendly. Its use not only helps to plant in small spaces but also helps reduce artificial chemical use and minimizes excess water use, as aquaponics consumes 90% less water than soil-based gardening. The study applied a descriptive and experimental design to assess and compare conventional and reconstructed aquaponic methods for reproducing tomatoes. The researchers created an observation checklist to determine the significant factors of the study. The study aims to determine the significant difference between traditional aquaponics and reconstructed aquaponics systems propagating tomatoes in terms of height, weight, girth, and number of fruits. The reconstructed aquaponics system’s higher growth yield results in a much more nourished crop than the traditional aquaponics system. It is superior in its number of fruits, height, weight, and girth measurement. Moreover, the reconstructed aquaponics system is proven to eliminate all the hindrances present in the traditional aquaponics system, which are overcrowding of fish, algae growth, pest problems, contaminated water, and dead fish.
Comparative analysis between traditional aquaponics and reconstructed aquapon...
TARGET DETECTION AND CLASSIFICATION PERFORMANCE ENHANCEMENT USING SUPERRESOLUTION INFRARED VIDEOS
1. Signal & Image Processing: An International Journal (SIPIJ) Vol.12, No.2, April 2021
DOI: 10.5121/sipij.2021.12203 33
TARGET DETECTION AND CLASSIFICATION
PERFORMANCE ENHANCEMENT USING SUPER-
RESOLUTION INFRARED VIDEOS
Chiman Kwan, David Gribben and Bence Budavari
Applied Research, LLC, Rockville, Maryland, USA
ABSTRACT
Long range infrared videos such as the Defense Systems Information Analysis Center (DSIAC) videos usually
do not have high resolution. In recent years, there are significant advancement in video super-resolution
algorithms. Here, we summarize our study on the use of super-resolution videos for target detection and
classification. We observed that super-resolution videos can significantly improve the detection and
classification performance. For example, for 3000 m range videos, we were able to improve the average
precision of target detection from 11% (without super-resolution) to 44% (with 4x super-resolution) and the
overall accuracy of target classification from 10% (without super-resolution) to 44% (with 2x super-
resolution).
KEYWORDS
Deep learning, mid-wave infrared (MWIR) videos, target detection and classification, contrast enhancement,
YOLO, ResNet
1. INTRODUCTION
For infrared videos, two groups of target detection algorithms are normally used in the literature.
The first group is to apply supervised machine learning algorithms. For example, some
conventional target tracking methods [1]-[4] normally require target locations in the first frame of
the videos to be known. Another group uses deep learning algorithms such as You Only Look Once
(YOLO) for optical and infrared videos [5]-[20]. Target locations in the first frame are not needed.
However, training videos are required in these algorithms. Some of those deep learning algorithms
[6]-[16] are using compressive measurements directly for target detection and classification,
meaning that no time consuming reconstruction of compressive measurements is needed and hence
fast target detection and classification can be achieved.
In long range infrared videos such as the DSIAC dataset, which has videos from 1000 m to 5000
m, the target size is small, the resolution is low, and the video quality is also low. It is therefore
extremely important to apply practical methods that can improve the detection and classification
performance using deep learning methods. In recent years, there have been huge progress in image
super-resolution development. Many high performance algorithms have been developed. It will be
interesting to investigate the incorporation of super-resolution videos for target detection and
classification.
In this paper, we present some results on target detection and classification in infrared videos. The
objective is to see how much gain one can get if one integrates video super-resolution with target
detection and classification algorithms. Our approach consists of the following steps. First, we
propose to apply a state-of-the-art video super-resolution algorithm to enhance the resolution of
2. Signal & Image Processing: An International Journal (SIPIJ) Vol.12, No.2, April 2021
34
the videos by two to four times. We have compared two deep learning algorithms for video super-
resolution. After that, we customize YOLO for target detection and a residual network (ResNet)
for classification. The YOLO is responsible for target detection and the target locations will be
passed to ResNet for classification. All the deep learning algorithms were customized using videos
from one particular range and other videos from other ranges were used for testing.
In the experiments of this paper, we focus on the DSIAC MWIR videos, which do not have high
resolution and hence it is very challenging. Due to long ranges, the vehicle size is quite small,
which then seriously affects the target detection and classification performance. Our contributions
are as follows:
• We investigated the application of a recent deep learning based video super-resolution (VSR)
algorithm to enhance the resolution of DSIAC videos. Two and four times resolution
enhanced videos were generated.
• We built a YOLO model for target detection and a ResNet model for target classification
using only original resolution videos at 1500 m. We then fed the enhanced resolution videos
into the trained YOLO and ResNet models and generated the detection and classification
statistics using videos at 2000 m, 2500 m, 3000 m, and 3500 m ranges. We observed that
detection and classification statistics using enhanced videos have improved tremendously. In
particular, the intersection of union (IoU) and average precision (AP) statistics have
improved quite a lot in almost all ranges.
Our paper is organized as follows. Section 2 describes the VSR algorithm, target detection and
classification algorithms, performance metrics, and infrared videos. Section 3 summarizes the
experimental results. Finally, some remarks are included in Section 4.
2. METHODS, PERFORMANCE METRICS, AND DATA
The proposed approach consists of three steps. First, we apply VSR to improve the image resolution
by two to four times. Second, we apply YOLO to locate the targets. Third, we apply ResNet to the
YOLO detected outputs and then classify the vehicles. All three steps require training. We used
1500 m range videos to train the above algorithms.
2.1. Video Super Resolution (VSR) Using Deep Learning
In [26], researchers developed a deep learning video super resolution (VSR) method known as
Zooming-Slow Motion (ZSM) in 2020. The ZSM can improve both the resolution of the frames
and the frame rate. There are three key components in ZSM: feature temporal interpolation
network, a deformable ConvLSTM, and a deep construction network [27]. The feature temporal
interpolation network is to perform frame interpolation. The bidirectional deformable ConvLSTM
is for aligning and aggregating temporal information. The deep construction network is to predict
and generate super resolution video frames. This ZSM architecture is depicted in Figure 1.
3. Signal & Image Processing: An International Journal (SIPIJ) Vol.12, No.2, April 2021
35
Figure 1. ZSM Architecture.
The visual performance of the various super-resolution methods is shown in Figure 2. We
compared ZSM with Dynamic Upsampling Filter (DUF) [25] and bicubic methods. One can see
that ZSM method yielded better results than the other methods in terms of clarity.
Figure 2. Comparison of the image qualities of different videos at different ranges and
super-resolution methods.
2.2. YOLO for Target Detection
YOLO and Faster R-CNN are some deep learning based object detectors that do not require initial
bounding boxes and can simultaneously detect objects. YOLO [22] and Faster R-CNN [23] have
comparable performance. The input image is normally resized to 448x448. There are 24
convolutional layers and two fully connected layers. The output is 7x7x30. We have used YOLOv2
because it is more accurate than YOLO version 1. The training of YOLO is simple where images
with ground truth target locations are used. We re-trained the last layer of YOLO using the 1500
m range videos. YOLO took approximately 2000 epochs to complete the training.
Although YOLO has a built-in classification module, the classification accuracy using YOLO’s
built-in module is not good as compared to ResNet [5]-[6].
2.3. ResNet for Target Classification
YOLO has been widely used for object detection such as humans, traffic signs, vehicles, buses, etc.
Its built-in classifier is, however, not so good for intra-class (e.g. BTR70 vs. BMP2) discrimination.
The ResNet-18 model [24] is an 18-layer convolutional neural network (CNN) that can avoid
performance saturation when training deeper layers.
Bidirectional
Deformatable
ConvLSTM
Feature
Temporal
Interpola-
tion
Residual
Block
Residual
Block
Residual
Block
Residual
Block
Residual
Block
4. Signal & Image Processing: An International Journal (SIPIJ) Vol.12, No.2, April 2021
36
The relationship between YOLO and ResNet in our paper can be explained in this way. YOLO
[22] was used to determine where the vehicles were located in each frame. YOLO generated
bounding boxes for those vehicles and that data were used to crop the vehicles from the image. The
cropped vehicles will be fed into the ResNet-18 for classification. To be specific, ResNet-18 is
used directly after bounding box information is obtained from YOLO.
Training of ResNet requires target patches. The targets are cropped from training videos. Mirror
images are then created. Data augmentation using scaling, rotation (every 45 degrees), and
illumination variations was used to generate more training data. For each cropped target, we created
a data set with 64 more images. We re-trained the last layer of the ResNet model, which was until
the validation score reached a steady state value.
2.4. Performance Metrics for Assessing Target Detection and Classification
Performance
The six different performance metrics to quantify the detection performance are: Center Location
Error (CLE), Distance Precision at 10 pixels (DP@10), Estimates in Ground Truth (EinGT),
Intersection over Union (IoU), Average Precision (AP), and number of frames with detection.
These metrics are as follows:
Center Location Error (CLE): This is the error between the center of the bounding box and
the ground-truth bounding box. Smaller means better. CLE is calculated by measuring the
distance between the ground truth center location (𝐶𝑥,𝑔𝑡, 𝐶𝑦,𝑔𝑡) and the detected center
location (𝐶𝑥,𝑒𝑠𝑡, 𝐶𝑦,𝑒𝑠𝑡). Mathematically, CLE is given by
𝐶𝐿𝐸 = √(𝐶𝑥,𝑒𝑠𝑡 − 𝐶𝑥,𝑔𝑡)
2
+ (𝐶𝑦,𝑒𝑠𝑡 − 𝐶𝑦,𝑔𝑡)
2
. (1)
Distance Precision (DP): This is the percentage of frames where the centroids of detected
bounding boxes are within 10 pixels of the centroid of ground-truth bounding boxes. Close
to 1 or 100% indicates good results.
Estimates in Ground Truth (EinGT): This is the percentage of the frames where the
centroids of the detected bounding boxes are inside the ground-truth bounding boxes. It
depends on the size of the bounding box and is simply a less strict version of the DP metric.
Close to 1 or 100% indicates good results.
Intersection over the Union (IoU): It is the ratio of the intersected area over the union of
the estimated and ground truth bounding boxes.
𝐼𝑜𝑈 =
𝐴𝑟𝑒𝑎 𝑜𝑓 𝐼𝑛𝑡𝑒𝑟𝑠𝑒𝑐𝑡𝑖𝑜𝑛
𝐴𝑟𝑒𝑎 𝑜𝑓 𝑈𝑛𝑖𝑜𝑛
(2)
Average Precision (AP): AP is the ratio between the intersection area and the area of the
estimated bounding box and the value is between 0 and 1, with 1 or 100% being the perfect
overlap. The AP being used can be computed as
𝐴𝑃 =
𝐴𝑟𝑒𝑎 𝑜𝑓 𝐼𝑛𝑡𝑒𝑟𝑠𝑒𝑐𝑡𝑖𝑜𝑛
𝐴𝑟𝑒𝑎 𝑜𝑓 𝑒𝑠𝑡𝑖𝑚𝑎𝑡𝑒𝑑 𝑏𝑜𝑢𝑛𝑑𝑖𝑛𝑔 𝑏𝑜𝑥𝑒𝑠
. (3)
Number of frames with detection: This is the total number of frames that have detection.
5. Signal & Image Processing: An International Journal (SIPIJ) Vol.12, No.2, April 2021
37
The confusion matrices were used for evaluating vehicle classification performance using ResNet.
From the confusion matrix, we can also evaluate overall accuracy (OA), average accuracy (AA),
and kappa coefficient.
2.5. DSIAC Data
We selected five vehicles in the DSIAC videos for detection and classification. There are optical
and mid-wave infrared (MWIR) videos collected at distances ranging from 1000 m to 5000 m with
500 m increments. The five types of vehicles are shown in Figure 3. These videos are challenging
for several reasons. First, the target sizes are small due to long ranges. This is very different from
some benchmark datasets such as MOT Challenge [21] where the range is short and the targets are
big. Second, the target orientations also change drastically. Third, the illuminations in different
videos are also different. Fourth, the cameras also move in some videos.
In this research, we focus mainly on MWIR night-time videos because MWIR is more effective
for surveillance during the nights. The video frame rate is 7 frames/second and the image size is
640x512. The total number of frames is 1800 per video. Each pixel is represented by 8 bits. All
frames are contrast enhanced using some reference frames in the 1500 m range videos.
(a) (b) (c)
(d) (e)
Figure 3. Five vehicles in DSIAC: (a) BTR70; (b) BRDM2; (c) BMP2; (d) T72; and (e) ZSU23-4.
3. EXPERIMENTS
3.1. Baseline Results Using Original Resolution Videos
Here, baseline means that the YOLO and ResNet models trained using 1500 m videos were tested
using the original resolution videos. The baseline performance metrics will be used as a baseline
to compare the results of using SR videos. There are five different distances that have results: 1500
m, 2000 m, 2500 m, 3000 m, and 3500 m. Table 1 contains the YOLO detection statistics for the
each distance while Table 2 contains the ResNet confusion matrices and classification statistics.
There is an obvious deterioration in accuracy as the vehicle distance moves from 1500 meters, the
distance the model was trained on.
(b) (c) (d)
(f) (g)
(b) (c) (d)
(f) (g)
(a) (b) (c)
(e) (f)
(a) (b) (c) (d)
(e) (f) (g)
(a) (b) (c) (d)
(e) (f) (g)
6. Signal & Image Processing: An International Journal (SIPIJ) Vol.12, No.2, April 2021
38
From Table 1 and Table 2, each metric trends worse as it moves further away from the trained 1500
meter distance. This is a trend that is seen across both detection and classification statistics. The
overall degradation in accuracy as distances move from the trained distances is quite extreme. For
example, with detection the AP value, measuring the amount of overlap between ground truth and
detected bounding box, halves with each increase of 500 meters. The final distance, 3500, is one-
fifth the previous distances value.
Table 1. Baseline YOLO detection results using original resolution videos. The metrics are named as
follows: Center Location Error (CLE), Distance Precision (DP), Estimates in Ground Truth (EinGT),
Intersection over Union (IoU), Average Precision (AP), and Detection Percentage (% det.).
1500 m CLE DP EinGT IoU AP % det.
BTR70 1.201 100.00% 100.00% 70.56% 70.60% 91.17%
BRDM2 1.279 100.00% 100.00% 78.54% 78.69% 91.06%
BMP2 1.092 100.00% 100.00% 87.70% 88.96% 91.06%
T72 1.497 100.00% 100.00% 85.21% 86.25% 91.11%
ZSU23-4 1.233 100.00% 100.00% 77.58% 77.75% 90.00%
Avg 1.260 100.00% 100.00% 79.92% 80.45% 90.88%
2000 m CLE DP EinGT IoU AP % det.
BTR70 1.861 100.00% 100.00% 30.64% 30.64% 93.44%
BRDM2 3.023 100.00% 100.00% 37.74% 37.75% 90.50%
BMP2 3.542 100.00% 100.00% 58.01% 58.69% 41.83%
T72 2.276 100.00% 100.00% 39.80% 39.80% 98.44%
ZSU23-4 8.953 97.83% 97.83% 38.11% 38.11% 84.56%
Avg 3.931 99.57% 99.57% 40.86% 41.00% 81.76%
2500 m CLE DP EinGT IoU AP % det.
BTR70 2.374 100.00% 100.00% 13.43% 13.43% 91.17%
BRDM2 3.68 100.00% 99.26% 16.44% 16.44% 90.50%
BMP2 24.834 89.79% 89.79% 23.42% 23.42% 75.61%
T72 3.253 99.95% 99.95% 20.99% 20.99% 78.89%
ZSU23-4 2.688 100.00% 100.00% 17.37% 17.37% 73.39%
Avg 7.366 97.95% 97.80% 18.33% 18.33% 81.91%
3000 m CLE DP EinGT IoU AP % det.
BTR70 1.995 100.00% 99.55% 7.09% 7.09% 49.00%
BRDM2 3.499 100.00% 99.38% 10.33% 10.33% 27.06%
BMP2 3.91 100.00% 98.48% 17.31% 17.31% 3.61%
T72 4.541 100.00% 77.02% 11.86% 11.86% 32.56%
ZSU23-4 2.18 100.00% 99.67% 11.25% 11.25% 15.67%
Avg 3.225 100.00% 94.82% 11.57% 11.57% 25.58%
3500 m CLE DP EinGT IoU AP % det.
BTR70 1.689 100.00% 83.52% 2.16% 2.16% 36.89%
BRDM2 3.112 99.48% 74.55% 2.65% 2.65% 42.56%
BMP2 1.886 100.00% 100.00% 3.80% 3.80% 0.17%
T72 3.975 99.86% 55.59% 2.84% 2.84% 36.39%
ZSU23-4 2.541 100.00% 69.80% 2.73% 2.73% 39.17%
Avg 2.641 99.87% 76.69% 2.84% 2.84% 31.03%
7. Signal & Image Processing: An International Journal (SIPIJ) Vol.12, No.2, April 2021
39
Table 2. Baseline ResNet classification results using original resolution videos. Confusion matrices with
Overall Accuracy (OA), Average Accuracy (AA), and kappa.
1500 m 5 6 9 11 12
BTR70 1849 0 0 0 2
BRDM2 0 1808 0 0 0
BMP2 0 0 1800 0 0
T72 0 0 0 1829 0
ZSU23-4 0 0 0 0 1882
Class Stats OA 99.98% AA 99.98% kappa 1.00
2000 m 5 6 9 11 12
BTR70 1511 49 167 56 84
BRDM2 0 1834 18 12 37
BMP2 7 30 715 0 2
T72 15 272 159 1739 95
ZSU23-4 0 90 191 0 1472
Class Stats OA 84.99% AA 86.50% kappa 0.85
2500 m 5 6 9 11 12
BTR70 43 608 313 516 434
BRDM2 0 1800 19 44 20
BMP2 0 19 1347 47 105
T72 2 98 561 938 378
ZSU23-4 65 236 391 662 554
Class Stats OA 50.89% AA 52.61% kappa 0.51
3000 m 5 6 9 11 12
BTR70 1 69 271 12 540
BRDM2 0 13 59 1 414
BMP2 0 3 52 0 11
T72 0 461 68 27 136
ZSU23-4 2 56 56 29 156
Class Stats OA 10.22% AA 27.53% kappa 0.10
3500 m 5 6 9 11 12
BTR70 48 31 368 65 192
BRDM2 36 6 331 113 284
BMP2 0 0 2 0 1
T72 8 218 113 354 5
ZSU23-4 4 10 491 172 78
Class Stats OA 16.66% AA 27.06% kappa 0.17
3.2. YOLO and ResNet Performance Using 2x Super-Resolution Videos
Here, we investigate the detection and classification performance of using 2x super-videos in the
testing stage. The goal is to see if there is an improvement over the baseline results, which are the
cases using the original resolution videos. 1500 m range videos (original resolution) were used for
training and other ranges for testing. Table 3 shows the detection results of distances 2000 m
through 3500 m and Table 4 shows the confusion matrices and classification results for the same
distances.
8. Signal & Image Processing: An International Journal (SIPIJ) Vol.12, No.2, April 2021
40
Detection results shown in Table 3 are certainly improved by a good margin as compared to those
using the original resolution videos. The largest improvement is seen at the further distances. This
improvement is seen mostly in the IoU and AP results. In general, this makes sense for the same
reasons mentioned when looking at the VSR cropped results shown in Figure 2. The other thing to
compare is whether there is any change to the down-sampled video metrics at 3000 meters. The
reduction in accuracy is a little less than two times in the IoU and AP values. There is no reduction
in CLE—when taking the pixel difference into account—DP or EinGT. Unfortunately, percent
detection is very hurt.
Classification present differently than detection results. In Table 4, the OA, AA, and kappa, the
largest improvements are seen at the further distances. Compared with the baseline classification
results, a slight improvement is seen at 2000 m but by 3000 m there is a 4 times improvement and
3500 there is a slightly greater than 2 times improvement. It is also interesting that for this version
of VSR it is even better than the full resolution. The confusion matrix for 3000 meters has fewer
data points due to the decrease in detection percentage but the metrics are around 30 percent more
accurate.
Table 3. YOLO detection metrics for distances 2000, 2500, 3000, and 3500 meters using videos
that have two times better resolution.
2000 m CLE DP EinGT IoU AP % det.
BTR70 2.596 100.00% 100.00% 70.39% 72.30% 99.39%
BRDM2 6.459 99.93% 100.00% 61.81% 64.27% 71.78%
BMP2 4.347 99.65% 100.00% 67.15% 70.55% 46.72%
T72 3.626 100.00% 100.00% 65.27% 67.00% 85.17%
ZSU23-4 3.791 100.00% 100.00% 70.96% 76.58% 87.06%
Avg 4.164 99.92% 100.00% 67.12% 70.14% 78.02%
2500 m CLE DP EinGT IoU AP % det.
BTR70 4.652 99.89% 99.67% 24.89% 24.89% 49.06%
BRDM2 7.250 97.03% 95.91% 32.88% 32.92% 13.78%
BMP2 3.810 100.00% 100.00% 53.68% 54.46% 79.28%
T72 5.198 99.85% 100.00% 35.46% 35.70% 33.50%
ZSU23-4 3.829 100.00% 100.00% 45.13% 45.28% 56.17%
Avg 4.948 99.35% 99.12% 38.41% 38.65% 46.36%
3000 m CLE DP EinGT IoU AP % det.
BTR70 4.143 100.00% 99.49% 11.93% 11.93% 19.78%
BRDM2 5.367 99.27% 93.82% 19.67% 19.67% 13.44%
BMP2 2.838 100.00% 100.00% 39.86% 39.86% 11.83%
T72 6.636 99.28% 99.28% 20.77% 20.78% 14.28%
ZSU23-4 2.512 100.00% 100.00% 22.31% 22.31% 30.72%
Avg 4.299 99.71% 98.52% 22.91% 22.91% 18.01%
3500 m CLE DP EinGT IoU AP % det.
BTR70 3.276 100.00% 87.18% 4.66% 4.66% 47.83%
BRDM2 4.167 99.64% 94.71% 6.86% 6.86% 43.39%
BMP2 3.447 100.00% 95.87% 10.96% 10.96% 19.44%
T72 3.721 99.90% 80.93% 8.14% 8.14% 42.89%
ZSU23-4 2.739 100.00% 93.63% 7.48% 7.48% 76.78%
Avg 3.470 99.91% 90.46% 7.62% 7.62% 46.07%
9. Signal & Image Processing: An International Journal (SIPIJ) Vol.12, No.2, April 2021
41
Table 4. ResNet Confusion matrices and classification metrics for 2000 through 3500 meters
using 2x super-resolution videos.
2000 m 5 6 9 11 12
BTR70 1430 0 90 89 326
BRDM2 4 1275 13 11 134
BMP2 0 8 482 44 316
T72 17 189 14 1296 152
ZSU23-4 1 54 32 63 1649
Class Stats OA 88.92% AA 90.23% kappa 0.8892
2500 m 5 6 9 11 12
BTR70 163 176 197 140 243
BRDM2 0 237 9 17 6
BMP2 0 0 1461 2 43
T72 0 25 188 321 125
ZSU23-4 3 6 59 114 886
Class Stats OA 69.40% AA 66.90% kappa 0.694
3000 m 5 6 9 11 12
BTR70 22 21 63 8 275
BRDM2 15 29 65 1 165
BMP2 4 0 215 0 5
T72 0 143 51 48 35
ZSU23-4 0 20 74 56 484
Class Stats OA 44.36% AA 41.17% kappa 0.4436
3500 m 5 6 9 11 12
BTR70 120 108 679 13 71
BRDM2 6 75 427 0 323
BMP2 1 3 359 0 0
T72 0 379 260 252 100
ZSU23-4 0 119 447 204 893
Class Stats OA 35.11% AA 39.83% kappa 0.3511
3.3. YOLO Performance Enhancement Using 4x Video Super-Resolution
Table 5 and Table 6 are for the full resolution VSR videos cropped to original video size. We only
focused on 3000 m range because it is difficult to discern any objects beyond this range. It is
observed that while the CLE value looks worse for the 3000 m distance versus the baseline results,
the VSR method has four times as many pixels. When divided by four, the average CLE is an
average of 1.905 pixels, which is almost half the 3.225 average for the baseline. The same can be
said for the DP value, which has a static 20 pixel distance. Otherwise, the results are greatly
improved for both detection and classification with IoU and AP being four times more accurate.
Compared with baseline results, the OA and kappa are three times more accurate.
Table 5. 3000 m range YOLO target detection results using 4x super-resolution videos.
3000 m CLE DP EinGT IoU AP % det.
BTR70 9.22 93.22% 98.18% 47.95% 48.91% 30.00%
BRDM2 6.403 99.30% 100.00% 55.49% 56.42% 54.83%
BMP2 6.481 98.67% 99.67% 40.12% 40.36% 59.61%
T72 7.221 100.00% 100.00% 33.11% 33.22% 57.61%
ZSU23-4 8.719 99.87% 100.00% 43.18% 43.45% 40.33%
Avg 7.61 98.21% 99.57% 43.97% 44.47% 48.48%
10. Signal & Image Processing: An International Journal (SIPIJ) Vol.12, No.2, April 2021
42
Table 6. 3000 m range ResNet confusion matrix and classification metrics using 4x
super-resolution videos.
3000 5 6 9 11 12
BTR70 139 259 44 363 390
BRDM2 3 260 115 28 375
BMP2 2 38 543 265 287
T72 3 391 16 143 52
ZSU23-4 2 260 135 260 542
Class Stats OA 30.04% AA 31.40% kappa 0.3004
3.4. Performance Comparison between Original Resolution and Super-Resolution
Videos
Here, we would like to compare the YOLO detection results in several cases for the 3000 m range
scenario. Table 7(a), (b), and (c) show the baseline YOLO results, YOLO results using 2x super-
resolution videos, and YOLO results using 4x super-resolution videos, respectively. Comparing
Table 7(a) and Table 7(b), one can see than the YOLO results with 2x super-resolution videos are
two times better in terms of IoU and AP. The 4x SR videos are even better. This clearly
demonstrates that SR videos can help YOLO performance.
For comparing the ResNet classification results, we focus on the 3000 m videos. Table 8 (a), (b),
and (c) show the results of baseline ResNet, ResNet using 2x SR videos, and ResNet using 4x SR
videos, respectively. The best performing one is the 2x SR videos case. However, even in the 2x
SR case, the OA and AA values are still quite low. This means that SR can only improve the
performance to a certain extent. For further improvement, the camera will need to be improved.
Table 7. Comparison of YOLO results for several cases using 3000 m MWIR nighttime
videos with different resolutions.
(a) YOLO baseline
3000 m CLE DP EinGT IoU AP % det.
BTR70 1.995 100.00% 99.55% 7.09% 7.09% 49.00%
BRDM2 3.499 100.00% 99.38% 10.33% 10.33% 27.06%
BMP2 3.91 100.00% 98.48% 17.31% 17.31% 3.61%
T72 4.541 100.00% 77.02% 11.86% 11.86% 32.56%
ZSU23-
4
2.18 100.00% 99.67% 11.25% 11.25% 15.67%
Avg 3.225 100.00% 94.82% 11.57% 11.57% 25.58%
(b) YOLO results using 2x VSR
3000 m CLE DP EinGT IoU AP % det.
BTR70 4.143 100.00% 99.49% 11.93% 11.93% 19.78%
BRDM2 5.367 99.27% 93.82% 19.67% 19.67% 13.44%
BMP2 2.838 100.00% 100.00% 39.86% 39.86% 11.83%
T72 6.636 99.28% 99.28% 20.77% 20.78% 14.28%
ZSU23-
4
2.512 100.00% 100.00% 22.31% 22.31% 30.72%
Avg 4.299 99.71% 98.52% 22.91% 22.91% 18.01%
11. Signal & Image Processing: An International Journal (SIPIJ) Vol.12, No.2, April 2021
43
(c) YOLO results using 4x VSR
3000 m CLE DP EinGT IoU AP % det.
BTR70 9.22 93.22% 98.18% 47.95% 48.91% 30.00%
BRDM2 6.403 99.30% 100.00% 55.49% 56.42% 54.83%
BMP2 6.481 98.67% 99.67% 40.12% 40.36% 59.61%
T72 7.221 100.00% 100.00% 33.11% 33.22% 57.61%
ZSU23-
4
8.719 99.87% 100.00% 43.18% 43.45% 40.33%
Avg 7.61 98.21% 99.57% 43.97% 44.47% 48.48%
Table 8. Comparison of 3000 m ResNet confusion matrix and classification metrics for
videos with different resolutions.
(a) Baseline ResNet
3000 m 5 6 9 11 12
BTR70 1 69 271 12 540
BRDM2 0 13 59 1 414
BMP2 0 3 52 0 11
T72 0 461 68 27 136
ZSU23-
4
2 56 56 29 156
Class Stats OA 10.22% AA 27.53% kappa 0.10
(b) ResNet 2x VSR
3000 m 5 6 9 11 12
BTR70 22 21 63 8 275
BRDM2 15 29 65 1 165
BMP2 4 0 215 0 5
T72 0 143 51 48 35
ZSU23-
4
0 20 74 56 484
Class Stats OA 44.36% AA 41.17% kappa 0.4436
(b) ResNet 4x VSR
3000 m 5 6 9 11 12
BTR70 139 259 44 363 390
BRDM2 3 260 115 28 375
BMP2 2 38 543 265 287
T72 3 391 16 143 52
ZSU23-
4
2 260 135 260 542
Class Stats OA 30.04% AA 31.40% kappa 0.3004
4. CONCLUSIONS AND FUTURE DIRECTIONS
In this paper, we have presented a framework for target detection and classification using long
range and low quality infrared videos. The framework consists of video super-resolution using a
state-of-the-art deep learning algorithm, a proven target detector using YOLO, and a customized
target classifier using ResNet. The integrated framework significantly improved the target
12. Signal & Image Processing: An International Journal (SIPIJ) Vol.12, No.2, April 2021
44
detection and classification performance using actual infrared videos. In particular, we have
observed that both the 2-times and 4-times super-resolution videos improved the YOLO detection
and ResNet classification performance.
One future direction is to investigate the use of YOLO-v3, which is a new version of YOLO, for
target detection. We will also investigate better training strategies to handle long range target
detection and classification applications.
CONFLICT OF INTEREST
The authors declare no conflict of interest.
ACKNOWLEDGEMENTS
This research was supported by the US Army under contract W909MY-20-P-0024. The views,
opinions and/or findings expressed are those of the author and should not be interpreted as
representing the official views or policies of the Department of Defense or the U.S. Government.
REFERENCES
[1] C. Kwan and B. Budavari, “Enhancing Small Moving Target Detection Performance in Low Quality
and Long Range Infrared Videos Using Optical Flow Techniques,” Remote Sensing, 12(24), 4024,
December 9, 2020.
[2] Y. Chen, G. Zhang, Y. Ma, J. U. Kang, and C. Kwan, “Small Infrared Target Detection based on Fast
Adaptive Masking and Scaling with Iterative Segmentation,” IEEE Geoscience and Remote Sensing
Letters, January 2021.
[3] C. Kwan and B. Budavari, “A High Performance Approach to Detecting Small Targets in Long Range
Low Quality Infrared Videos,” arXiv:2012.02579, 2020.
[4] H. S. Demir and A. E. Cetin, “Co-difference based object tracking algorithm for infrared videos,” IEEE
International Conference on Image Processing (ICIP), Phoenix, AZ, 2016, pp. 434-438
[5] C. Kwan, D. Gribben, and T. Tran, “Tracking and Classification of Multiple Human Objects Directly
in Compressive Measurement Domain for Low Quality Optical Videos,” IEEE Ubiquitous Computing,
Electronics & Mobile Communication Conference, New York City. 2019.
[6] C. Kwan, B. Chou, J. Yang, and T. Tran, “Deep Learning based Target Tracking and Classification
Directly in Compressive Measurement for Low Quality Videos,” Signal & Image Processing: An
International Journal (SIPIJ), November 16, 2019.
[7] C. Kwan, D. Gribben, A. Rangamani, T. Tran, J. Zhang, R. Etienne-Cummings, “Detection and
Confirmation of Multiple Human Targets Using Pixel-Wise Code Aperture Measurements,” J.
Imaging. 6(6), 40, 2020.
[8] C. Kwan, B. Chou, J. Yang, and T. Tran, “Deep Learning based Target Tracking and Classification for
Infrared Videos Using Compressive Measurements,” Journal Signal and Information Processing,
November 2019.
[9] C. Kwan and D. Gribben, “Target Detection and Classification Improvements using Contrast Enhanced
16-bit Infrared Videos,” Signal & Image Processing: An International Journal (SIPIJ), February 28,
2021.
[10] S. Lohit, K. Kulkarni, and P. K. Turaga, “Direct inference on compressive measurements using
convolutional neural networks,” Int. Conference on Image Processing. 2016. 1913-1917.
[11] A. Adler, M. Elad, and M. Zibulevsky, “Compressed Learning: A Deep Neural Network Approach,”
arXiv:1610.09615v1 [cs.CV]. 2016.
[12] Y. Xu and K. F. Kelly, “Compressed domain image classification using a multi-rate neural network,”
arXiv:1901.09983 [cs.CV]. 2019.
[13] Z. W. Wang, V. Vineet, F. Pittaluga, S. N. Sinha, O. Cossairt, and S. B. Kang, “Privacy-Preserving
Action Recognition Using Coded Aperture Videos,” IEEE Conference on Computer Vision and Pattern
Recognition (CVPR) Workshops. 2019.
13. Signal & Image Processing: An International Journal (SIPIJ) Vol.12, No.2, April 2021
45
[14] H. Vargas, Y. Fonseca, and H. Arguello, “Object Detection on Compressive Measurements using
Correlation Filters and Sparse Representation,” 26th European Signal Processing Conference
(EUSIPCO). 1960-1964, 2018.
[15] A. Değerli, S. Aslan, M. Yamac, B. Sankur, and M. Gabbouj, “Compressively Sensed Image
Recognition,” 7th European Workshop on Visual Information Processing (EUVIP), Tampere, 2018.
[16] P. Latorre-Carmona, V. J. Traver, J. S. Sánchez, and E. Tajahuerce, “Online reconstruction-free single-
pixel image classification,” Image and Vision Computing, 86, 2018.
[17] C. Li and W. Wang, “Detection and Tracking of Moving Targets for Thermal Infrared Video
Sequences,” Sensors, 18, 3944, 2018.
[18] Y. Tan, Y. Guo, and C. Gao, “Background subtraction based level sets for human segmentation in
thermal infrared surveillance systems,” Infrared Phys. Technol., 61: 230–240, 2013.
[19] A. Berg, J. Ahlberg, and M. Felsberg, “Channel Coded Distribution Field Tracking for Thermal
Infrared Imagery,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
Workshops; Las Vegas, NV, USA. pp. 1248–1256, 2016.
[20] C. Kwan, D. Gribben, B. Chou, B. Budavari, J. Larkin, A. Rangamani, T. Tran, J. Zhang, R. Etienne-
Cummings, “Real-Time and Deep Learning based Vehicle Detection and Classification using Pixel-
Wise Code Exposure Measurements,” Electronics, June 18, 2020.
[21] MOT Challenge, motchallenge.net/
[22] J. Redmon and A. Farhadi, “YOLOv3: An Incremental Improvement,” arxiv, 2018.
[23] S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards Real-Time Object Detection with
Region Proposal Networks,” Advances in Neural Information Processing Systems, 2015.
[24] K. He, X. Zhang, S. Ren, and J. Sun, “Deep Residual Learning for Image Recognition,” Conference on
Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016.
[25] Y. Jo, S. W. Oh, J. Kang, and S. J. Kim, “Deep video super-resolution network using dynamic
upsampling filters without explicit motion compensation,” Proceedings of the IEEE conference on
computer vision and pattern recognition (pp. 3224-3232), 2018.
[26] X. Xiang, “Mukosame/Zooming-Slow-Mo-CVPR-2020.” GitHub, github.com/Mukosame/Zooming-
Slow-Mo-CVPR-2020, 2020.
[27] X. Xiang, Y. Tian, Y. Zhang, Y. Fu, J. P. Allebach, and C. Xu, “Zooming Slow-Mo: Fast and Accurate
One-Stage Space-Time Video Super-Resolution,” Proceedings of the IEEE/CVF Conference on
Computer Vision and Pattern Recognition, pp. 3370-3379, 2020.
AUTHORS
Chiman Kwan received his Ph.D. degree in electrical engineering from the University of Texas at Arlington
in 1993. He has written one book, four book chapters, 15 patents, 75 invention disclosures, 380 technical
papers in journals and conferences, and 550 technical reports. Over the past 25 years, he has been the
PI/Program Manager of over 120 diverse projects with total funding exceeding 36 million dollars. He is also
the founder and Chief Technology Officer of Signal Processing, Inc. and Applied Research LLC. He received
numerous awards from IEEE, NASA, and some other agencies and has given several keynote speeches in
several international conferences. He is an Associate Editor of IEEE Trans. Geoscience and Remote Sensing.
David Gribben received his B.S. in Computer Science and Physics from McDaniel College, Maryland,
USA, in 2015. He is a software engineer at ARLLC. He has been involved in diverse projects, including
mission planning for UAVs, target detection and classification, and remote sensing.
Bence Budavari received his B.S. in Audio Engineering from Belmont University in 2015. He is a software
developer at ARLLC.