It is challenging to detect vehicles in long range and low quality infrared videos using deep learning
techniques such as You Only Look Once (YOLO) mainly due to small target size. This is because small
targets do not have detailed texture information. This paper focuses on practical approaches for target
detection in infrared videos using deep learning techniques. We first investigated a newer version of You
Only Look Once (YOLO v4). We then proposed a practical and effective approach by training the YOLO
model using videos from longer ranges. Experimental results using real infrared videos ranging from 1000
m to 3500 m demonstrated huge performance improvements. In particular, the average detection
percentage over the six ranges of 1000 m to 3500 m improved from 54% when we used the 1500 m videos
for training to 95% if we used the 3000 m videos for training.
TARGET DETECTION AND CLASSIFICATION PERFORMANCE ENHANCEMENT USING SUPERRESOLU...sipij
Long range infrared videos such as the Defense Systems Information Analysis Center (DSIAC) videos usually
do not have high resolution. In recent years, there are significant advancement in video super-resolution
algorithms. Here, we summarize our study on the use of super-resolution videos for target detection and
classification. We observed that super-resolution videos can significantly improve the detection and
classification performance. For example, for 3000 m range videos, we were able to improve the average
precision of target detection from 11% (without super-resolution) to 44% (with 4x super-resolution) and the
overall accuracy of target classification from 10% (without super-resolution) to 44% (with 2x superresolution).
This document provides information about Elysium Technologies Private Limited, an ISO 9001:2008 certified research and development company located in Singapore, Madurai, Trichy, Coimbatore, Kollam, and Cochin. It lists their branch office locations and contact information. The document then provides a list of 12 digital image processing projects available for the 2012-2013 academic year.
In this paper, an attempt has been made to extract texture
features from facial images using an improved method of
Illumination Invariant Feature Descriptor. The proposed local
ternary Pattern based feature extractor viz., Steady Illumination
Local Ternary Pattern (SIcLTP) has been used to extract texture
features from Indian face database. The similarity matching
between two extracted feature sets has been obtained using Zero
Mean Sum of Squared Differences (ZSSD). The RGB facial images
are first converted into the YIQ colour space to reduce the
redundancy of the RGB images. The result obtained has been
analysed using Receiver Operating Characteristic curve, and is
found to be promising. Finally the results are validated with
standard local binary pattern (LBP) extractor.
Depth Estimation from Defocused Images: a SurveyIJAAS Team
An important step in 3D data generation is the generation of depth map. Depth map is a black and white image which has exactly the same size of the original captured 2D image that indicates the relative distance of each pixel from the observer to the objects in the real world. This paper presents a survey of Depth Perception from Defocused or blurs images as well as image from motion. The change of distance of the object from the camera has direct relation with the amount of blurring of object in the image. The amount of blurring will be calculated with a comparison in front of the camera directly and can be seen with the changes at gray level around the edges of objects.
Enhance Example-Based Super Resolution to Achieve Fine Magnification of Low ...IJMER
This paper proposes an enhanced example-based super resolution method to achieve fine magnification of low resolution images using a neighbor embedding method. The proposed method has two phases: 1) A dictionary construction phase that extracts patch pairs from high and low resolution images and stores them in a dictionary. K-means trees are used to organize the patches. 2) A super resolution phase that searches the dictionary using neighbor embedding to find the best matching high resolution patch to synthesize the output image. Non-local mean filtering is then applied to remove blurring. Experimental results on text images show the proposed method improves image quality over other techniques by reducing blurring and artifacts.
ieee projects download, base paper for ieee projects, ieee projects list, ieee projects titles, ieee projects for cse, ieee projects on networking,ieee projects 2012, ieee projects 2013, final year project, computer science final year projects, final year projects for information technology, ieee final year projects, final year students projects, students projects in java, students projects download, students projects in java with source code, students projects architecture, free ieee papers
MULTIMODAL BIOMETRICS RECOGNITION FROM FACIAL VIDEO VIA DEEP LEARNINGcsandit
Biometrics identification using multiple modalities has attracted the attention of many researchers as it produces more robust and trustworthy results than single modality biometrics.
In this paper, we present a novel multimodal recognition system that trains a Deep Learning Network to automatically learn features after extracting multiple biometric modalities from a single data source, i.e., facial video clips. Utilizing different modalities, i.e., left ear, left profile face, frontal face, right profile face, and right ear, present in the facial video clips, we train supervised denosing autoencoders to automatically extract robust and non-redundant features.The automatically learned features are then used to train modality specific sparse classifiers to perform the multimodal recognition. Experiments conducted on the constrained facial video
dataset (WVU) and the unconstrained facial video dataset (HONDA/UCSD), resulted in a 99.17% and 97.14% rank-1 recognition rates, respectively. The multimodal recognition
accuracy demonstrates the superiority and robustness of the proposed approach irrespective of the illumination, non-planar movement, and pose variations present in the video clips.
TARGET DETECTION AND CLASSIFICATION PERFORMANCE ENHANCEMENT USING SUPERRESOLU...sipij
Long range infrared videos such as the Defense Systems Information Analysis Center (DSIAC) videos usually
do not have high resolution. In recent years, there are significant advancement in video super-resolution
algorithms. Here, we summarize our study on the use of super-resolution videos for target detection and
classification. We observed that super-resolution videos can significantly improve the detection and
classification performance. For example, for 3000 m range videos, we were able to improve the average
precision of target detection from 11% (without super-resolution) to 44% (with 4x super-resolution) and the
overall accuracy of target classification from 10% (without super-resolution) to 44% (with 2x superresolution).
This document provides information about Elysium Technologies Private Limited, an ISO 9001:2008 certified research and development company located in Singapore, Madurai, Trichy, Coimbatore, Kollam, and Cochin. It lists their branch office locations and contact information. The document then provides a list of 12 digital image processing projects available for the 2012-2013 academic year.
In this paper, an attempt has been made to extract texture
features from facial images using an improved method of
Illumination Invariant Feature Descriptor. The proposed local
ternary Pattern based feature extractor viz., Steady Illumination
Local Ternary Pattern (SIcLTP) has been used to extract texture
features from Indian face database. The similarity matching
between two extracted feature sets has been obtained using Zero
Mean Sum of Squared Differences (ZSSD). The RGB facial images
are first converted into the YIQ colour space to reduce the
redundancy of the RGB images. The result obtained has been
analysed using Receiver Operating Characteristic curve, and is
found to be promising. Finally the results are validated with
standard local binary pattern (LBP) extractor.
Depth Estimation from Defocused Images: a SurveyIJAAS Team
An important step in 3D data generation is the generation of depth map. Depth map is a black and white image which has exactly the same size of the original captured 2D image that indicates the relative distance of each pixel from the observer to the objects in the real world. This paper presents a survey of Depth Perception from Defocused or blurs images as well as image from motion. The change of distance of the object from the camera has direct relation with the amount of blurring of object in the image. The amount of blurring will be calculated with a comparison in front of the camera directly and can be seen with the changes at gray level around the edges of objects.
Enhance Example-Based Super Resolution to Achieve Fine Magnification of Low ...IJMER
This paper proposes an enhanced example-based super resolution method to achieve fine magnification of low resolution images using a neighbor embedding method. The proposed method has two phases: 1) A dictionary construction phase that extracts patch pairs from high and low resolution images and stores them in a dictionary. K-means trees are used to organize the patches. 2) A super resolution phase that searches the dictionary using neighbor embedding to find the best matching high resolution patch to synthesize the output image. Non-local mean filtering is then applied to remove blurring. Experimental results on text images show the proposed method improves image quality over other techniques by reducing blurring and artifacts.
ieee projects download, base paper for ieee projects, ieee projects list, ieee projects titles, ieee projects for cse, ieee projects on networking,ieee projects 2012, ieee projects 2013, final year project, computer science final year projects, final year projects for information technology, ieee final year projects, final year students projects, students projects in java, students projects download, students projects in java with source code, students projects architecture, free ieee papers
MULTIMODAL BIOMETRICS RECOGNITION FROM FACIAL VIDEO VIA DEEP LEARNINGcsandit
Biometrics identification using multiple modalities has attracted the attention of many researchers as it produces more robust and trustworthy results than single modality biometrics.
In this paper, we present a novel multimodal recognition system that trains a Deep Learning Network to automatically learn features after extracting multiple biometric modalities from a single data source, i.e., facial video clips. Utilizing different modalities, i.e., left ear, left profile face, frontal face, right profile face, and right ear, present in the facial video clips, we train supervised denosing autoencoders to automatically extract robust and non-redundant features.The automatically learned features are then used to train modality specific sparse classifiers to perform the multimodal recognition. Experiments conducted on the constrained facial video
dataset (WVU) and the unconstrained facial video dataset (HONDA/UCSD), resulted in a 99.17% and 97.14% rank-1 recognition rates, respectively. The multimodal recognition
accuracy demonstrates the superiority and robustness of the proposed approach irrespective of the illumination, non-planar movement, and pose variations present in the video clips.
This doctoral dissertation examines facial skin motion properties from video for modeling and applications. It presents two methods for computing strain patterns from video: a finite difference method and a finite element method. The finite element method incorporates material properties of facial tissues by modeling their Young's modulus values. Experiments show strain patterns are discriminative and stable features for facial expression recognition, age estimation, and person identification. The dissertation also develops a method for expression invariant face matching by modeling Young's modulus from multiple expressions.
A Survey on Deblur The License Plate Image from Fast Moving Vehicles Using Sp...IRJET Journal
The document discusses challenges in deblurring license plate images from fast moving vehicles. It proposes using sparse representation techniques to estimate blur kernels and deconvolve blurred license plate images. A block diagram shows blurred license plates being processed through kernel regression, orientation estimation, deconvolution and Wiener filtering to recover a sharp license plate image that can be recognized. The goal is to develop a method that can handle large blur kernels and improve license plate recognition from blurred images of fast moving vehicles.
Video Key-Frame Extraction using Unsupervised Clustering and Mutual ComparisonCSCJournals
The document presents a novel method for extracting key frames from videos using unsupervised clustering and mutual comparison. It assigns weights of 70% to color (HSV histogram) and 30% to texture (GLCM) when computing frame similarity for clustering. It then performs mutual comparison of extracted key frames to remove near duplicates, improving accuracy. The algorithm is computationally simple and able to detect unique key frames, improving concept detection performance as validated on open databases.
IRJET- A Survey on the Enhancement of Video Action Recognition using Semi-Sup...IRJET Journal
This document summarizes several papers related to enhancing video action recognition using semi-supervised learning. It discusses methods that use knowledge adaptation from images to videos to improve action recognition performance in videos. Specifically, it describes approaches that use labeled videos and unlabeled videos in a semi-supervised framework to address limitations of fully supervised methods, such as data scarcity and overfitting. The document reviews papers on techniques like pose-based recognition, color descriptors, grouplet representations, relevance feedback, event recognition from web data, dense trajectories, and discriminative key poses.
IRJET- DNA Fragmentation Pattern and its Application in DNA Sample Type Class...IRJET Journal
This document proposes a framework for classifying DNA sample types using DNA fragmentation patterns. It involves several steps: (1) applying Gaussian blurring and bilateral filtering to reduce noise from images of fragmentation patterns, (2) extracting the region of interest, (3) calculating gray-level co-occurrence matrix features such as contrast and correlation, (4) using a k-nearest neighbors classifier to classify samples, and (5) segmenting images based on the classification. The results showed near 100% accuracy in classifying hundreds of DNA samples as different types based on their fragmentation patterns.
Effective segmentation of sclera, iris and pupil in noisy eye imagesTELKOMNIKA JOURNAL
In today’s sensitive environment, for personal authentication, iris recognition is the most attentive
technique among the various biometric technologies. One of the key steps in the iris recognition system is
the accurate iris segmentation from its surrounding noises including pupil and sclera of a captured
eye-image. In our proposed method, initially input image is preprocessed by using bilateral filtering.
After the preprocessing of images contour based features such as, brightness, color and texture features
are extracted. Then entropy is measured based on the extracted contour based features to effectively
distinguishing the data in the images. Finally, the convolution neural network (CNN) is used for
the effective sclera, iris and pupil parts segmentations based on the entropy measure. The proposed
results are analyzed to demonstrate the better performance of the proposed segmentation method than
the existing methods.
VIDEO SEGMENTATION & SUMMARIZATION USING MODIFIED GENETIC ALGORITHMijcsa
Video summarization of the segmented video is an essential process for video thumbnails, video surveillance and video downloading. Summarization deals with extracting few frames from each scene and creating a summary video which explains all course of action of full video with in short duration of time. The proposed research work discusses about the segmentation and summarization of the frames. A genetic algorithm (GA) for segmentation and summarization is required to view the highlight of an event by selecting few important frames required. The GA is modified to select only key frames for summarization and the comparison of modified GA is done with the GA.
Measuring the Effects of Rational 7th and 8th Order Distortion Model in the R...IOSRJVSP
One of the biggest and important issues in the video watermarking is the distortion and attacks. The attacks and distortion affect the digital watermarking. Watermarking is an embedding process. With the help of watermarking, we insert the data into the digital objects. There are few methods are available for authentication of data, securing/protection of data. The watermarking technique also provides the data security, copyright protection and authentication of the data. Watermarking provides a comfortable life to authorized users. In my proposed work, we are working on distorted watermarked video. The distortion is present on the watermarked video is rational 7 th and 8 th order distortion model. In this paper, firstly we are embedding the watermark information into the original video and after that work on the distortion model which may be come into the watermarked video. We are also calculating the PSNR (Peak signal to noise ratio), SSIM (Structural similarity index measure), Correlation, BER (Bit Error Rate) and MSE (Mean Square Error) parameters for distorted watermarked video. We are showing the relationship between correlation and SSIM with BER, MSE and PSNR.
This document describes a new technique called "encrypted sensing" for capturing fingerprint images using digital holography and double random phase encoding (DRPE) for encryption. The fingerprint image is optically encrypted during capture using two random phase masks. This reduces the risk of theft or leakage of personal biometric data. The encrypted hologram can be decrypted into a clear fingerprint image only when the correct decryption key is applied. Experimental results show the decrypted images can be accurately verified for authentication purposes.
Survey on Image Integration of Misaligned ImagesIRJET Journal
The document discusses methods for integrating misaligned images to improve image quality under low lighting conditions. It reviews previous works that combine images like flash/no-flash pairs to transfer details and color, but have limitations when images are misaligned. The paper proposes a new method using a long-exposure image and flash image that introduces a local linear model to transfer color while maintaining natural colors and high contrast, without deteriorating contrast for misaligned pairs. It concludes that handling misaligned images remains a challenge with existing methods and further work is needed.
Development of a Location Invariant Crack Detection and Localisation Model (L...CSCJournals
The document presents a model called LICDAL (Location Invariant Crack Detection and Localization) that was developed to detect and localize cracks in unconstrained oil pipeline images using deep convolutional neural networks. LICDAL applies transfer learning on Faster R-CNN to make the model location invariant by training it on images of cracked pipelines from various locations. When tested on 10 new images from different locations, LICDAL detected and localized cracks with a mean average precision of 97.3%, demonstrating high performance and location invariance. The model performance is compared to an existing crack detection model, showing LICDAL improves crack identification through localization.
International Journal of Engineering and Science Invention (IJESI) is an international journal intended for professionals and researchers in all fields of computer science and electronics. IJESI publishes research articles and reviews within the whole field Engineering Science and Technology, new teaching methods, assessment, validation and the impact of new technologies and it will continue to provide information on the latest trends and developments in this ever-expanding subject. The publications of papers are selected through double peer reviewed to ensure originality, relevance, and readability. The articles published in our journal can be accessed online.
Remote Sensing Image Scene ClassificationGaurav Singh
This project proposes methods for classifying scenes in remote sensing images. It compares the accuracy of traditional bag-of-visual-words (BoVW) models using handcrafted features to a bag-of-convolutional features (BoCF) model using deep learning. It also applies a grey wolf optimizer (GWO) algorithm for image segmentation. Results show BoCF doubled the accuracy of BoVW, and combining BoVW with GWO improved accuracy over BoVW alone. The project concludes more work is needed to better combine remote sensing data and deep learning for classification.
Deep-learning based single object tracker for night surveillance IJECEIAES
This document summarizes a research paper that proposes a deep learning-based single object tracker for night surveillance video. The tracker uses pre-trained convolutional neural networks coupled with fully connected layers. The network is trained online during tracking to model changes in the target object's appearance as it moves. The paper tests different hyperparameters for the online learning, including optimization algorithms, learning rates, and ratios of positive and negative training samples, to determine the optimal setup for nighttime tracking. It evaluates the tracker on 14 night surveillance videos and finds that using the Adam optimizer with a learning rate of 0.00075 and a 2:1 ratio of positive to negative samples achieves the best accuracy.
COVID-19-Preventions-Control-System and Unconstrained Face-mask and Face-hand...SaiPrakash106
The motive of the project is classifies a person by the trained model whether the person is wearing mask or not and also classifies improper mask and face hand interaction.
Inclined Image Recognition for Aerial Mapping using Deep Learning and Tree ba...TELKOMNIKA JOURNAL
One of the important capabilities of an unmanned aerial vehicle (UAV) is aerial mapping. Aerial mapping is an image registration problem, i.e., the problem of transforming different sets of images into one coordinate system. In image registration, the quality of the output is strongly influenced by the quality of input (i.e., images captured by the UAV). Therefore, selecting the quality of input images becomes important and one of the challenging task in aerial mapping because the ground truth in the mapping process is not given before the UAV flies. Typically, UAV takes images in sequence irrespective of its flight orientation and roll angle. These may result in the acquisition of bad quality images, possibly compromising the quality of mapping results, and increasing the computational cost of a registration process. To address these issues, we need a recognition system that is able to recognize images that are not suitable for the registration process. In this paper, we define these unsuitable images as “inclined images,” i.e., images captured by UAV that are not perpendicular to the ground. Although we can calculate the inclination angle using a gyroscope attached to the UAV, our interest here is to recognize these inclined images without the use of additional sensors in order to mimic how humans perform this task visually. To realize that, we utilize a deep learning method with the combination of tree-based models to build an inclined image recognition system. We have validated the proposed system with the images captured by the UAV. We collected 192 images and labelled them with two different levels of classes (i.e., coarse- and fine-classification). We compared this with several models and the results showed that our proposed system yielded an improvement of accuracy rate up to 3%.
1. The document proposes a method for unsupervised visual domain adaptation using auxiliary information from the target domain. It uses partial least squares (PLS) to generate target domain subspaces incorporating subsidiary non-visual data like depth features.
2. Experiments on object recognition across the ImageNet and B3DO datasets show the method outperforms previous subspace-based approaches that only use visual information. Using auxiliary target domain data improves classification accuracy consistently.
3. The approach is an improvement as most prior work on unsupervised domain adaptation did not leverage non-visual cues available in the target domain. This opens possibilities to incorporate other multimedia signals from life logging systems.
Target Detection and Classification Performance Enhancement using Super-Resol...sipij
Long range infrared videos such as the Defense Systems Information Analysis Center (DSIAC) videos usually
do not have high resolution. In recent years, there are significant advancement in video super-resolution
algorithms. Here, we summarize our study on the use of super-resolution videos for target detection and
classification. We observed that super-resolution videos can significantly improve the detection and
classification performance. For example, for 3000 m range videos, we were able to improve the average
precision of target detection from 11% (without super-resolution) to 44% (with 4x super-resolution) and the
overall accuracy of target classification from 10% (without super-resolution) to 44% (with 2x superresolution).
TARGET DETECTION AND CLASSIFICATION PERFORMANCE ENHANCEMENT USING SUPERRESOLU...sipij
Long range infrared videos such as the Defense Systems Information Analysis Center (DSIAC) videos usually
do not have high resolution. In recent years, there are significant advancement in video super-resolution
algorithms. Here, we summarize our study on the use of super-resolution videos for target detection and
classification. We observed that super-resolution videos can significantly improve the detection and
classification performance. For example, for 3000 m range videos, we were able to improve the average
precision of target detection from 11% (without super-resolution) to 44% (with 4x super-resolution) and the
overall accuracy of target classification from 10% (without super-resolution) to 44% (with 2x superresolution).
Target Detection and Classification Improvements using Contrast Enhanced 16-b...sipij
In our earlier target detection and classification papers, we used 8-bit infrared videos in the Defense
Systems Information Analysis Center(DSIAC) video dataset. In this paper, we focus on how we can
improve the target detection and classification results using 16-bit videos. One problem with the 16-bit
videos is that some image frames have very low contrast. Two methods were explored to improve upon
previous detection and classification results. The first method used to improve contrast was effectively the
same as the baseline 8-bit video data but using the 16-bit raw data rather than the 8-bit data taken from
the avi files. The second method used was a second order histogram matching algorithm that preserves the
16-bit nature of the videos while providing normalization and contrast enhancement. Results showed the
second order histogram matching algorithm improved the target detection using You Only Look Once
(YOLO) and classificationusing Residual Network (ResNet) performance. The average precision (AP)
metric in YOLO was improved by 8%. This is quite significant. The overall accuracy (OA) of ResNet has
been improved by 12%. This is also very significant.
TARGET DETECTION AND CLASSIFICATION IMPROVEMENTS USING CONTRAST ENHANCED 16-B...sipij
This document describes research on improving target detection and classification in infrared videos using 16-bit data and contrast enhancement techniques. Two contrast enhancement methods are explored: 1) histogram matching 16-bit videos to an 8-bit reference frame, and 2) a second order histogram matching algorithm that preserves the 16-bit nature of videos while enhancing contrast. Experimental results showed that the second method improved target detection performance using YOLO and classification performance using ResNet compared to previous 8-bit results and the first histogram matching method.
Partial half fine-tuning for object detection with unmanned aerial vehiclesIAESIJAI
Deep learning has shown outstanding performance in object detection tasks with unmanned aerial vehicles (UAVs), which involve the fine-tuning technique to improve performance by transferring features from pre-trained models to specific tasks. However, despite the immense popularity of fine-tuning, no works focused on to study of the precise fine-tuning effects of object detection tasks with UAVs. In this research, we conduct an experimental analysis of each existing fine-tuning strategy to answer which is the best procedure for transferring features with fine-tuning techniques. We also proposed a partial half fine-tuning strategy which we divided into two techniques: first half fine-tuning (First half F-T) and final half fine-tuning (Final half F-T). We use the VisDrone dataset for the training and validation process. Here we show that the partial half fine-tuning: Final half F-T can outperform other fine-tuning techniques and are also better than one of the state-of-the-art methods by a difference of 19.7% from the best results of previous studies.
This doctoral dissertation examines facial skin motion properties from video for modeling and applications. It presents two methods for computing strain patterns from video: a finite difference method and a finite element method. The finite element method incorporates material properties of facial tissues by modeling their Young's modulus values. Experiments show strain patterns are discriminative and stable features for facial expression recognition, age estimation, and person identification. The dissertation also develops a method for expression invariant face matching by modeling Young's modulus from multiple expressions.
A Survey on Deblur The License Plate Image from Fast Moving Vehicles Using Sp...IRJET Journal
The document discusses challenges in deblurring license plate images from fast moving vehicles. It proposes using sparse representation techniques to estimate blur kernels and deconvolve blurred license plate images. A block diagram shows blurred license plates being processed through kernel regression, orientation estimation, deconvolution and Wiener filtering to recover a sharp license plate image that can be recognized. The goal is to develop a method that can handle large blur kernels and improve license plate recognition from blurred images of fast moving vehicles.
Video Key-Frame Extraction using Unsupervised Clustering and Mutual ComparisonCSCJournals
The document presents a novel method for extracting key frames from videos using unsupervised clustering and mutual comparison. It assigns weights of 70% to color (HSV histogram) and 30% to texture (GLCM) when computing frame similarity for clustering. It then performs mutual comparison of extracted key frames to remove near duplicates, improving accuracy. The algorithm is computationally simple and able to detect unique key frames, improving concept detection performance as validated on open databases.
IRJET- A Survey on the Enhancement of Video Action Recognition using Semi-Sup...IRJET Journal
This document summarizes several papers related to enhancing video action recognition using semi-supervised learning. It discusses methods that use knowledge adaptation from images to videos to improve action recognition performance in videos. Specifically, it describes approaches that use labeled videos and unlabeled videos in a semi-supervised framework to address limitations of fully supervised methods, such as data scarcity and overfitting. The document reviews papers on techniques like pose-based recognition, color descriptors, grouplet representations, relevance feedback, event recognition from web data, dense trajectories, and discriminative key poses.
IRJET- DNA Fragmentation Pattern and its Application in DNA Sample Type Class...IRJET Journal
This document proposes a framework for classifying DNA sample types using DNA fragmentation patterns. It involves several steps: (1) applying Gaussian blurring and bilateral filtering to reduce noise from images of fragmentation patterns, (2) extracting the region of interest, (3) calculating gray-level co-occurrence matrix features such as contrast and correlation, (4) using a k-nearest neighbors classifier to classify samples, and (5) segmenting images based on the classification. The results showed near 100% accuracy in classifying hundreds of DNA samples as different types based on their fragmentation patterns.
Effective segmentation of sclera, iris and pupil in noisy eye imagesTELKOMNIKA JOURNAL
In today’s sensitive environment, for personal authentication, iris recognition is the most attentive
technique among the various biometric technologies. One of the key steps in the iris recognition system is
the accurate iris segmentation from its surrounding noises including pupil and sclera of a captured
eye-image. In our proposed method, initially input image is preprocessed by using bilateral filtering.
After the preprocessing of images contour based features such as, brightness, color and texture features
are extracted. Then entropy is measured based on the extracted contour based features to effectively
distinguishing the data in the images. Finally, the convolution neural network (CNN) is used for
the effective sclera, iris and pupil parts segmentations based on the entropy measure. The proposed
results are analyzed to demonstrate the better performance of the proposed segmentation method than
the existing methods.
VIDEO SEGMENTATION & SUMMARIZATION USING MODIFIED GENETIC ALGORITHMijcsa
Video summarization of the segmented video is an essential process for video thumbnails, video surveillance and video downloading. Summarization deals with extracting few frames from each scene and creating a summary video which explains all course of action of full video with in short duration of time. The proposed research work discusses about the segmentation and summarization of the frames. A genetic algorithm (GA) for segmentation and summarization is required to view the highlight of an event by selecting few important frames required. The GA is modified to select only key frames for summarization and the comparison of modified GA is done with the GA.
Measuring the Effects of Rational 7th and 8th Order Distortion Model in the R...IOSRJVSP
One of the biggest and important issues in the video watermarking is the distortion and attacks. The attacks and distortion affect the digital watermarking. Watermarking is an embedding process. With the help of watermarking, we insert the data into the digital objects. There are few methods are available for authentication of data, securing/protection of data. The watermarking technique also provides the data security, copyright protection and authentication of the data. Watermarking provides a comfortable life to authorized users. In my proposed work, we are working on distorted watermarked video. The distortion is present on the watermarked video is rational 7 th and 8 th order distortion model. In this paper, firstly we are embedding the watermark information into the original video and after that work on the distortion model which may be come into the watermarked video. We are also calculating the PSNR (Peak signal to noise ratio), SSIM (Structural similarity index measure), Correlation, BER (Bit Error Rate) and MSE (Mean Square Error) parameters for distorted watermarked video. We are showing the relationship between correlation and SSIM with BER, MSE and PSNR.
This document describes a new technique called "encrypted sensing" for capturing fingerprint images using digital holography and double random phase encoding (DRPE) for encryption. The fingerprint image is optically encrypted during capture using two random phase masks. This reduces the risk of theft or leakage of personal biometric data. The encrypted hologram can be decrypted into a clear fingerprint image only when the correct decryption key is applied. Experimental results show the decrypted images can be accurately verified for authentication purposes.
Survey on Image Integration of Misaligned ImagesIRJET Journal
The document discusses methods for integrating misaligned images to improve image quality under low lighting conditions. It reviews previous works that combine images like flash/no-flash pairs to transfer details and color, but have limitations when images are misaligned. The paper proposes a new method using a long-exposure image and flash image that introduces a local linear model to transfer color while maintaining natural colors and high contrast, without deteriorating contrast for misaligned pairs. It concludes that handling misaligned images remains a challenge with existing methods and further work is needed.
Development of a Location Invariant Crack Detection and Localisation Model (L...CSCJournals
The document presents a model called LICDAL (Location Invariant Crack Detection and Localization) that was developed to detect and localize cracks in unconstrained oil pipeline images using deep convolutional neural networks. LICDAL applies transfer learning on Faster R-CNN to make the model location invariant by training it on images of cracked pipelines from various locations. When tested on 10 new images from different locations, LICDAL detected and localized cracks with a mean average precision of 97.3%, demonstrating high performance and location invariance. The model performance is compared to an existing crack detection model, showing LICDAL improves crack identification through localization.
International Journal of Engineering and Science Invention (IJESI) is an international journal intended for professionals and researchers in all fields of computer science and electronics. IJESI publishes research articles and reviews within the whole field Engineering Science and Technology, new teaching methods, assessment, validation and the impact of new technologies and it will continue to provide information on the latest trends and developments in this ever-expanding subject. The publications of papers are selected through double peer reviewed to ensure originality, relevance, and readability. The articles published in our journal can be accessed online.
Remote Sensing Image Scene ClassificationGaurav Singh
This project proposes methods for classifying scenes in remote sensing images. It compares the accuracy of traditional bag-of-visual-words (BoVW) models using handcrafted features to a bag-of-convolutional features (BoCF) model using deep learning. It also applies a grey wolf optimizer (GWO) algorithm for image segmentation. Results show BoCF doubled the accuracy of BoVW, and combining BoVW with GWO improved accuracy over BoVW alone. The project concludes more work is needed to better combine remote sensing data and deep learning for classification.
Deep-learning based single object tracker for night surveillance IJECEIAES
This document summarizes a research paper that proposes a deep learning-based single object tracker for night surveillance video. The tracker uses pre-trained convolutional neural networks coupled with fully connected layers. The network is trained online during tracking to model changes in the target object's appearance as it moves. The paper tests different hyperparameters for the online learning, including optimization algorithms, learning rates, and ratios of positive and negative training samples, to determine the optimal setup for nighttime tracking. It evaluates the tracker on 14 night surveillance videos and finds that using the Adam optimizer with a learning rate of 0.00075 and a 2:1 ratio of positive to negative samples achieves the best accuracy.
COVID-19-Preventions-Control-System and Unconstrained Face-mask and Face-hand...SaiPrakash106
The motive of the project is classifies a person by the trained model whether the person is wearing mask or not and also classifies improper mask and face hand interaction.
Inclined Image Recognition for Aerial Mapping using Deep Learning and Tree ba...TELKOMNIKA JOURNAL
One of the important capabilities of an unmanned aerial vehicle (UAV) is aerial mapping. Aerial mapping is an image registration problem, i.e., the problem of transforming different sets of images into one coordinate system. In image registration, the quality of the output is strongly influenced by the quality of input (i.e., images captured by the UAV). Therefore, selecting the quality of input images becomes important and one of the challenging task in aerial mapping because the ground truth in the mapping process is not given before the UAV flies. Typically, UAV takes images in sequence irrespective of its flight orientation and roll angle. These may result in the acquisition of bad quality images, possibly compromising the quality of mapping results, and increasing the computational cost of a registration process. To address these issues, we need a recognition system that is able to recognize images that are not suitable for the registration process. In this paper, we define these unsuitable images as “inclined images,” i.e., images captured by UAV that are not perpendicular to the ground. Although we can calculate the inclination angle using a gyroscope attached to the UAV, our interest here is to recognize these inclined images without the use of additional sensors in order to mimic how humans perform this task visually. To realize that, we utilize a deep learning method with the combination of tree-based models to build an inclined image recognition system. We have validated the proposed system with the images captured by the UAV. We collected 192 images and labelled them with two different levels of classes (i.e., coarse- and fine-classification). We compared this with several models and the results showed that our proposed system yielded an improvement of accuracy rate up to 3%.
1. The document proposes a method for unsupervised visual domain adaptation using auxiliary information from the target domain. It uses partial least squares (PLS) to generate target domain subspaces incorporating subsidiary non-visual data like depth features.
2. Experiments on object recognition across the ImageNet and B3DO datasets show the method outperforms previous subspace-based approaches that only use visual information. Using auxiliary target domain data improves classification accuracy consistently.
3. The approach is an improvement as most prior work on unsupervised domain adaptation did not leverage non-visual cues available in the target domain. This opens possibilities to incorporate other multimedia signals from life logging systems.
Target Detection and Classification Performance Enhancement using Super-Resol...sipij
Long range infrared videos such as the Defense Systems Information Analysis Center (DSIAC) videos usually
do not have high resolution. In recent years, there are significant advancement in video super-resolution
algorithms. Here, we summarize our study on the use of super-resolution videos for target detection and
classification. We observed that super-resolution videos can significantly improve the detection and
classification performance. For example, for 3000 m range videos, we were able to improve the average
precision of target detection from 11% (without super-resolution) to 44% (with 4x super-resolution) and the
overall accuracy of target classification from 10% (without super-resolution) to 44% (with 2x superresolution).
TARGET DETECTION AND CLASSIFICATION PERFORMANCE ENHANCEMENT USING SUPERRESOLU...sipij
Long range infrared videos such as the Defense Systems Information Analysis Center (DSIAC) videos usually
do not have high resolution. In recent years, there are significant advancement in video super-resolution
algorithms. Here, we summarize our study on the use of super-resolution videos for target detection and
classification. We observed that super-resolution videos can significantly improve the detection and
classification performance. For example, for 3000 m range videos, we were able to improve the average
precision of target detection from 11% (without super-resolution) to 44% (with 4x super-resolution) and the
overall accuracy of target classification from 10% (without super-resolution) to 44% (with 2x superresolution).
Target Detection and Classification Improvements using Contrast Enhanced 16-b...sipij
In our earlier target detection and classification papers, we used 8-bit infrared videos in the Defense
Systems Information Analysis Center(DSIAC) video dataset. In this paper, we focus on how we can
improve the target detection and classification results using 16-bit videos. One problem with the 16-bit
videos is that some image frames have very low contrast. Two methods were explored to improve upon
previous detection and classification results. The first method used to improve contrast was effectively the
same as the baseline 8-bit video data but using the 16-bit raw data rather than the 8-bit data taken from
the avi files. The second method used was a second order histogram matching algorithm that preserves the
16-bit nature of the videos while providing normalization and contrast enhancement. Results showed the
second order histogram matching algorithm improved the target detection using You Only Look Once
(YOLO) and classificationusing Residual Network (ResNet) performance. The average precision (AP)
metric in YOLO was improved by 8%. This is quite significant. The overall accuracy (OA) of ResNet has
been improved by 12%. This is also very significant.
TARGET DETECTION AND CLASSIFICATION IMPROVEMENTS USING CONTRAST ENHANCED 16-B...sipij
This document describes research on improving target detection and classification in infrared videos using 16-bit data and contrast enhancement techniques. Two contrast enhancement methods are explored: 1) histogram matching 16-bit videos to an 8-bit reference frame, and 2) a second order histogram matching algorithm that preserves the 16-bit nature of videos while enhancing contrast. Experimental results showed that the second method improved target detection performance using YOLO and classification performance using ResNet compared to previous 8-bit results and the first histogram matching method.
Partial half fine-tuning for object detection with unmanned aerial vehiclesIAESIJAI
Deep learning has shown outstanding performance in object detection tasks with unmanned aerial vehicles (UAVs), which involve the fine-tuning technique to improve performance by transferring features from pre-trained models to specific tasks. However, despite the immense popularity of fine-tuning, no works focused on to study of the precise fine-tuning effects of object detection tasks with UAVs. In this research, we conduct an experimental analysis of each existing fine-tuning strategy to answer which is the best procedure for transferring features with fine-tuning techniques. We also proposed a partial half fine-tuning strategy which we divided into two techniques: first half fine-tuning (First half F-T) and final half fine-tuning (Final half F-T). We use the VisDrone dataset for the training and validation process. Here we show that the partial half fine-tuning: Final half F-T can outperform other fine-tuning techniques and are also better than one of the state-of-the-art methods by a difference of 19.7% from the best results of previous studies.
DEEP LEARNING BASED TARGET TRACKING AND CLASSIFICATION DIRECTLY IN COMPRESSIV...sipij
Past research has found that compressive measurements save data storage and bandwidth usage. However, it is also observed that compressive measurements are difficult to be used directly for target tracking and classification without pixel reconstruction. This is because the Gaussian random matrix destroys the target location information in the original video frames. This paper summarizes our research effort on target tracking and classification directly in the compressive measurement domain. We focus on one type of compressive measurement using pixel subsampling. That is, the compressive measurements are obtained by randomly subsample the original pixels in video frames. Even in such special setting, conventional trackers still do not work well. We propose a deep learning approach that integrates YOLO (You Only Look Once) and ResNet (residual network) for target tracking and classification in low quality videos. YOLO is for multiple target detection and ResNet is for target classification. Extensive experiments using optical and mid-wave infrared (MWIR) videos in the SENSIAC database demonstrated the efficacy of the proposed approach.
PERCEPTUALLY LOSSLESS COMPRESSION WITH ERROR CONCEALMENT FOR PERISCOPE AND SO...sipij
We present a video compression framework that has two key features. First, we aim at achieving
perceptually lossless compression for low frame rate videos (6 fps). Four well-known video codecs in the
literature have been evaluated and the performance was assessed using four well-known performance
metrics. Second, we investigated the impact of error concealment algorithms for handling corrupted pixels
due to transmission errors in communication channels. Extensive experiments using actual videos have
been performed to demonstrate the proposed framework.
IRJET- A Real Time Yolo Human Detection in Flood Affected Areas based on Vide...IRJET Journal
This document proposes a method for real-time human detection in flood-affected areas using video content analysis and the YOLO object detection algorithm. It trains YOLO on the COCO Human dataset to detect and localize humans in video frames from surveillance cameras. The results show that YOLO can accurately detect multiple humans, even with occlusion, and single humans under varying illumination. This approach aims to help rescue operations quickly identify affected areas and prioritize aid.
This document discusses the development of a face mask detection system using YOLOv4. The system uses a deep learning model with YOLOv4 to detect faces in real-time video and determine if each person is wearing a mask or not. It is trained on images of faces with and without masks. The model uses CSPDarknet53 as the backbone network and PANet for feature aggregation. It is implemented with OpenCV and a Python GUI for a user interface. The goal is to help enforce mask mandates and alert authorities if too many people in an area are not wearing masks.
Perceptually Lossless Compression with Error Concealment for Periscope and So...sipij
We present a video compression framework that has two key features. First, we aim at achieving perceptually lossless compression for low frame rate videos (6 fps). Four well-known video codecs in the literature have been evaluated and the performance was assessed using four well-known performance metrics. Second, we investigated the impact of error concealment algorithms for handling corrupted pixels
due to transmission errors in communication channels. Extensive experiments using actual videos have been performed to demonstrate the proposed framework.
IRJET- Survey Paper on Anomaly Detection in Surveillance VideosIRJET Journal
This document discusses anomaly detection in surveillance videos using deep learning techniques. It presents a multiple instance learning (MIL) framework that treats videos as bags and video segments as instances. A deep MIL ranking model is trained using weakly labeled videos (labeled as normal or anomalous at the video level, not segment level) to learn to assign higher anomaly scores to anomalous segments. The model incorporates sparsity and temporal smoothness constraints to predict sparse, smoothly varying anomaly scores. Evaluation shows this MIL approach achieves better anomaly detection performance than state-of-the-art methods. However, low anomaly recognition rates on challenging datasets leave room for future work to improve detection of unknown anomalies.
Multi Image Deblurring using Complementary Sets of Fluttering Patterns by Mul...IRJET Journal
This document discusses a proposed method for multi-image deblurring using complementary sets of fluttering patterns and an alternating direction multiplier method. Existing methods for coded exposure and multi-image deblurring have limitations like generating complex fluttering patterns, low signal-to-noise ratio, and loss of spectral information. The proposed method uses a multiplier algorithm to optimize a latent image and generate simple binary fluttering patterns for single or multiple input images. This helps reduce spectral loss and recover spatially consistent deblurred images with minimum noise. The method involves preprocessing the input image, setting regularization parameters, performing deconvolution iteratively using matrices, and outputting a deblurred image with sharp details and low noise.
Real Time Sign Language Recognition Using Deep LearningIRJET Journal
The document describes a study that used the YOLOv5 deep learning model to perform real-time sign language recognition. The researchers trained and tested the model on the Roboflow dataset along with additional images. They achieved 88.4% accuracy, 76.6% precision, and 81.2% recall. For comparison, they also trained a CNN model which achieved lower accuracy of 52.98%. The YOLOv5 model was able to detect signs in complex environments and perform accurate real-time detection, demonstrating its advantages over CNN for this task.
Realtime Face mask Detector using YoloV4IRJET Journal
This document presents a real-time face mask detector using YOLOv4. The system was able to detect faces with 94.75% accuracy and a maximum frame rate of 38 FPS. It used a dataset of images with bounding box annotations that was split into training and validation sets. The YOLOv4 model was trained on the darknet framework using this dataset. Testing showed it could accurately detect single people with or without masks and multiple people in various scenarios. The authors conclude the model provides fast and accurate mask detection in real-time suitable for applications like monitoring mask compliance.
Top Cited Articles in Signal & Image Processing 2021-2022sipij
Signal & Image Processing : An International Journal is an Open Access peer-reviewed journal intended for researchers from academia and industry, who are active in the multidisciplinary field of signal & image processing. The scope of the journal covers all theoretical and practical aspects of the Digital Signal Processing & Image processing, from basic research to development of application.
Authors are solicited to contribute to the journal by submitting articles that illustrate research results, projects, surveying works and industrial experiences that describe significant advances in the areas of Signal & Image processing.
Implementation of features dynamic tracking filter to tracing pupilssipij
The objective of this paper is to show the implementation of an artificial vision filter capable of tracking the
pupils of a person in a video sequence. There are several algorithms that can achieve this objective, for this
case, features dynamic tracking selected, which is a method that traces patterns between each frame that
form a video scene, this type of processing offers the advantage of eliminating the problems of occlusion
patterns of interest. The implementation was tested on a base of videos of people with different physical
characteristics of the eyes. An additional goal is to obtain information of the eye movements that are
captured and pupil coordinates for each of these movements. These data could help some studies related to
eye health.
Table of Contents - June 2021, Volume 12, Number 3sipij
Signal & Image Processing : An International Journal is an Open Access peer-reviewed journal intended for researchers from academia and industry, who are active in the multidisciplinary field of signal & image processing. The scope of the journal covers all theoretical and practical aspects of the Digital Signal Processing & Image processing, from basic research to development of application.
Authors are solicited to contribute to the journal by submitting articles that illustrate research results, projects, surveying works and industrial experiences that describe significant advances in the areas of Signal & Image processing.
End-to-end deep auto-encoder for segmenting a moving object with limited tra...IJECEIAES
The document proposes two end-to-end deep auto-encoder approaches for segmenting moving objects from surveillance videos when limited training data is available. The first approach uses transfer learning with a pre-trained VGG-16 model as the encoder and its transposed architecture as the decoder. The second approach uses a multi-depth auto-encoder with convolutional and upsampling layers. Both approaches apply data augmentation techniques like PCA and traditional methods to increase the training data size. The models are trained and evaluated on the CDnet2014 dataset, achieving better performance than other models trained with limited data.
This document proposes using the YOLOv5 object detection framework for real-time ship detection in satellite images. It reviews existing ship detection methods including machine learning and deep learning approaches. The methodology uses a dataset of satellite images with ship annotations to train and evaluate YOLOv5 models of different sizes (nano, small, medium, large, extra-large). Experimental results show the performance of each model on metrics like mAP, precision, and recall for real-time ship detection.
Video saliency-recognition by applying custom spatio temporal fusion techniqueIAESIJAI
Video saliency detection is a major growing field with quite few contributions to it. The general method available today is to conduct frame wise saliency detection and this leads to several complications, including an incoherent pixel-based saliency map, making it not so useful. This paper provides a novel solution to saliency detection and mapping with its custom spatio-temporal fusion method that uses frame wise overall motion colour saliency along with pixel-based consistent spatio-temporal diffusion for its temporal uniformity. In the proposed method section, it has been discussed how the video is fragmented into groups of frames and each frame undergoes diffusion and integration in a temporary fashion for the colour saliency mapping to be computed. Then the inter group frame are used to format the pixel-based saliency fusion, after which the features, that is, fusion of pixel saliency and colour information, guide the diffusion of the spatio temporal saliency. With this, the result has been tested with 5 publicly available global saliency evaluation metrics and it comes to conclusion that the proposed algorithm performs better than several state-of-the-art saliency detection methods with increase in accuracy with a good value margin. All the results display the robustness, reliability, versatility and accuracy.
Similar to PRACTICAL APPROACHES TO TARGET DETECTION IN LONG RANGE AND LOW QUALITY INFRARED VIDEOS (20)
Use PyCharm for remote debugging of WSL on a Windo cf5c162d672e4e58b4dde5d797...shadow0702a
This document serves as a comprehensive step-by-step guide on how to effectively use PyCharm for remote debugging of the Windows Subsystem for Linux (WSL) on a local Windows machine. It meticulously outlines several critical steps in the process, starting with the crucial task of enabling permissions, followed by the installation and configuration of WSL.
The guide then proceeds to explain how to set up the SSH service within the WSL environment, an integral part of the process. Alongside this, it also provides detailed instructions on how to modify the inbound rules of the Windows firewall to facilitate the process, ensuring that there are no connectivity issues that could potentially hinder the debugging process.
The document further emphasizes on the importance of checking the connection between the Windows and WSL environments, providing instructions on how to ensure that the connection is optimal and ready for remote debugging.
It also offers an in-depth guide on how to configure the WSL interpreter and files within the PyCharm environment. This is essential for ensuring that the debugging process is set up correctly and that the program can be run effectively within the WSL terminal.
Additionally, the document provides guidance on how to set up breakpoints for debugging, a fundamental aspect of the debugging process which allows the developer to stop the execution of their code at certain points and inspect their program at those stages.
Finally, the document concludes by providing a link to a reference blog. This blog offers additional information and guidance on configuring the remote Python interpreter in PyCharm, providing the reader with a well-rounded understanding of the process.
Rainfall intensity duration frequency curve statistical analysis and modeling...bijceesjournal
Using data from 41 years in Patna’ India’ the study’s goal is to analyze the trends of how often it rains on a weekly, seasonal, and annual basis (1981−2020). First, utilizing the intensity-duration-frequency (IDF) curve and the relationship by statistically analyzing rainfall’ the historical rainfall data set for Patna’ India’ during a 41 year period (1981−2020), was evaluated for its quality. Changes in the hydrologic cycle as a result of increased greenhouse gas emissions are expected to induce variations in the intensity, length, and frequency of precipitation events. One strategy to lessen vulnerability is to quantify probable changes and adapt to them. Techniques such as log-normal, normal, and Gumbel are used (EV-I). Distributions were created with durations of 1, 2, 3, 6, and 24 h and return times of 2, 5, 10, 25, and 100 years. There were also mathematical correlations discovered between rainfall and recurrence interval.
Findings: Based on findings, the Gumbel approach produced the highest intensity values, whereas the other approaches produced values that were close to each other. The data indicates that 461.9 mm of rain fell during the monsoon season’s 301st week. However, it was found that the 29th week had the greatest average rainfall, 92.6 mm. With 952.6 mm on average, the monsoon season saw the highest rainfall. Calculations revealed that the yearly rainfall averaged 1171.1 mm. Using Weibull’s method, the study was subsequently expanded to examine rainfall distribution at different recurrence intervals of 2, 5, 10, and 25 years. Rainfall and recurrence interval mathematical correlations were also developed. Further regression analysis revealed that short wave irrigation, wind direction, wind speed, pressure, relative humidity, and temperature all had a substantial influence on rainfall.
Originality and value: The results of the rainfall IDF curves can provide useful information to policymakers in making appropriate decisions in managing and minimizing floods in the study area.
Batteries -Introduction – Types of Batteries – discharging and charging of battery - characteristics of battery –battery rating- various tests on battery- – Primary battery: silver button cell- Secondary battery :Ni-Cd battery-modern battery: lithium ion battery-maintenance of batteries-choices of batteries for electric vehicle applications.
Fuel Cells: Introduction- importance and classification of fuel cells - description, principle, components, applications of fuel cells: H2-O2 fuel cell, alkaline fuel cell, molten carbonate fuel cell and direct methanol fuel cells.
An improved modulation technique suitable for a three level flying capacitor ...IJECEIAES
This research paper introduces an innovative modulation technique for controlling a 3-level flying capacitor multilevel inverter (FCMLI), aiming to streamline the modulation process in contrast to conventional methods. The proposed
simplified modulation technique paves the way for more straightforward and
efficient control of multilevel inverters, enabling their widespread adoption and
integration into modern power electronic systems. Through the amalgamation of
sinusoidal pulse width modulation (SPWM) with a high-frequency square wave
pulse, this controlling technique attains energy equilibrium across the coupling
capacitor. The modulation scheme incorporates a simplified switching pattern
and a decreased count of voltage references, thereby simplifying the control
algorithm.
Software Engineering and Project Management - Introduction, Modeling Concepts...Prakhyath Rai
Introduction, Modeling Concepts and Class Modeling: What is Object orientation? What is OO development? OO Themes; Evidence for usefulness of OO development; OO modeling history. Modeling
as Design technique: Modeling, abstraction, The Three models. Class Modeling: Object and Class Concept, Link and associations concepts, Generalization and Inheritance, A sample class model, Navigation of class models, and UML diagrams
Building the Analysis Models: Requirement Analysis, Analysis Model Approaches, Data modeling Concepts, Object Oriented Analysis, Scenario-Based Modeling, Flow-Oriented Modeling, class Based Modeling, Creating a Behavioral Model.
Introduction- e - waste – definition - sources of e-waste– hazardous substances in e-waste - effects of e-waste on environment and human health- need for e-waste management– e-waste handling rules - waste minimization techniques for managing e-waste – recycling of e-waste - disposal treatment methods of e- waste – mechanism of extraction of precious metal from leaching solution-global Scenario of E-waste – E-waste in India- case studies.
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024Sinan KOZAK
Sinan from the Delivery Hero mobile infrastructure engineering team shares a deep dive into performance acceleration with Gradle build cache optimizations. Sinan shares their journey into solving complex build-cache problems that affect Gradle builds. By understanding the challenges and solutions found in our journey, we aim to demonstrate the possibilities for faster builds. The case study reveals how overlapping outputs and cache misconfigurations led to significant increases in build times, especially as the project scaled up with numerous modules using Paparazzi tests. The journey from diagnosing to defeating cache issues offers invaluable lessons on maintaining cache integrity without sacrificing functionality.
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...IJECEIAES
Medical image analysis has witnessed significant advancements with deep learning techniques. In the domain of brain tumor segmentation, the ability to
precisely delineate tumor boundaries from magnetic resonance imaging (MRI)
scans holds profound implications for diagnosis. This study presents an ensemble convolutional neural network (CNN) with transfer learning, integrating
the state-of-the-art Deeplabv3+ architecture with the ResNet18 backbone. The
model is rigorously trained and evaluated, exhibiting remarkable performance
metrics, including an impressive global accuracy of 99.286%, a high-class accuracy of 82.191%, a mean intersection over union (IoU) of 79.900%, a weighted
IoU of 98.620%, and a Boundary F1 (BF) score of 83.303%. Notably, a detailed comparative analysis with existing methods showcases the superiority of
our proposed model. These findings underscore the model’s competence in precise brain tumor localization, underscoring its potential to revolutionize medical
image analysis and enhance healthcare outcomes. This research paves the way
for future exploration and optimization of advanced CNN models in medical
imaging, emphasizing addressing false positives and resource efficiency.
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...
PRACTICAL APPROACHES TO TARGET DETECTION IN LONG RANGE AND LOW QUALITY INFRARED VIDEOS
1. Signal & Image Processing: An International Journal (SIPIJ) Vol.12, No.3, June 2021
DOI: 10.5121/sipij.2021.12301 1
PRACTICAL APPROACHES TO TARGET DETECTION
IN LONG RANGE AND LOW QUALITY
INFRARED VIDEOS
Chiman Kwan and David Gribben
Applied Research, LLC, Rockville, Maryland, USA
ABSTRACT
It is challenging to detect vehicles in long range and low quality infrared videos using deep learning
techniques such as You Only Look Once (YOLO) mainly due to small target size. This is because small
targets do not have detailed texture information. This paper focuses on practical approaches for target
detection in infrared videos using deep learning techniques. We first investigated a newer version of You
Only Look Once (YOLO v4). We then proposed a practical and effective approach by training the YOLO
model using videos from longer ranges. Experimental results using real infrared videos ranging from 1000
m to 3500 m demonstrated huge performance improvements. In particular, the average detection
percentage over the six ranges of 1000 m to 3500 m improved from 54% when we used the 1500 m videos
for training to 95% if we used the 3000 m videos for training.
KEYWORDS
Deep learning; YOLO v4; infrared videos; target detection; training strategy
1. INTRODUCTION
For infrared videos, people have normally applied two groups of target detection algorithms. The
first group of methods [1]-[6] requires target locations in the first frame of the videos to be
known and then those target features in the first frame are used to detect the targets in subsequent
frames. The second group uses deep learning algorithms such as You Only Look Once (YOLO)
for target detection in optical and infrared videos [7]-[21]. Target locations in the first frame are
no longer required. However, some training videos are needed in these algorithms. Some deep
learning algorithms [6]-[16] use compressive measurements directly without time consuming
reconstruction of compressive measurements for target detection and classification. As a result,
fast target detection and classification can be achieved.
In long range infrared videos, the target size is small, the resolution is low, and the video quality
such as contrast is also poor. It is therefore extremely important to develop practical methods that
can improve the detection performance using deep learning methods. In [22], we proposed the
incorporation of video super-resolution (VSR) techniques to enhance target detection
performance in infrared videos. The resolution of the video frame is improved by two to four
times. The target detection and classification performance using actual infrared videos was
observed to be improved. In another paper [23], low contrast videos were enhanced using 16-bit
videos and we also observed improved performance in target detection and classification. In [24],
target motion information has been utilized using optical flow techniques. We observed that
target detection performance has been improved significantly. In some videos, if the targets are
moving towards or away from the imager, target motion may be difficult to extract. In such
2. Signal & Image Processing: An International Journal (SIPIJ) Vol.12, No.3, June 2021
2
scenarios, target detection using single frames can be applied. In [25][26], target detection
techniques based on single frames were proposed and evaluated. There are also many recent
papers discussing various methods for target detection (no classification) in infrared
images/videos [27]-[39].
In our early target detection and classification papers [7]-[10], YOLO v3 [40] was used.
Recently, there are new YOLO versions in the public domain. In this paper, we present practical
approaches to enhancing the detection performance in long range infrared videos. First, we
investigated target detection performance enhancement using YOLO v4 [42]. YOLO v4 has more
layers than YOLO v3. However, after running some experiments, the improvement in target
detection is observed to be slight, but noticeable. For instance, for 2000 m videos, the average
precision improved from 52% to 56%. Second, we investigated target detection performance
enhancement using a new training strategy. In our previous papers, we used videos at 1500 m to
train the YOLO. Although target detection performance is good for videos at 1000 m and 2000 m
ranges, the performance drops significantly for 2500 m to 3500 m videos. After some extensive
experiments, we observed that using videos at 2500 m or 3000 m to train the YOLO model
actually performed the best across all ranges. For example, the average detection percentage is
54% over the six ranges of 1000 m to 3500 m when we used the 1500 m video for training.
However, if we used the 3000 m videos for training, the average detection percentage over six
ranges became 95%. This is a dramatic improvement.
Our key contribution is the following. We propose a practical training strategy to improve target
detection in long range infrared videos. Instead of using near range videos to train the YOLO
model, the best strategy is to use far range videos for training. Although the idea is simple, we
observed dramatic performance improvement.
The remainder of this paper is organized as follows. Section 2 briefly summarizes the target
detection algorithm, performance metrics for target detection, and data. Section 3 shows that a
new YOLO version 4 improves over an older version 3 using actual long range infrared videos.
Section 4 summarizes a new training strategy that can improve detection performance in all
ranges. We also compared with a two-model approach. Section 5 concludes the paper with a few
remarks.
2. ALGORITHMS, METRICS, AND DATA
2.1. YOLOv3 versus YOLOv4
Here, we would like to briefly compare the YOLO v3 and v4 models. In general, they are largely
the same. They share the same general framework around it to encourage an accurate model. The
Input and Backbone as shown in Fig. 1, are exactly the same except for the Darknet model used.
In v3, the Darknet model used is Darknet53, a 53 layer model depicted in Fig. 2, while the
Darknet model used for v4 is a 137 layer model.
3. Signal & Image Processing: An International Journal (SIPIJ) Vol.12, No.3, June 2021
3
Figure 1. YOLO V4 Block diagram. Input is the images; Backbone is the base Darknet model; Dense
Prediction is the YOLO model; Sparse Prediction is Faster R-CNN [41].
Figure 2. Diagram showing construction of the layers in the Darknet53 model.
According to paper [42] put out by the authors of YOLO v4, it is faster and more accurate than
YOLO v3. The improvement denoted in Fig. 3 is about a 10 percent improvement in Average
Precision at the same frame per second (FPS).
4. Signal & Image Processing: An International Journal (SIPIJ) Vol.12, No.3, June 2021
4
Figure 3. Comparison of YOLO v4, YOLO v3, and other strong object detectors [42].
2.2. Performance Metrics for Assessing Target Detection Performance
The five different performance metrics to assess the detection performance are:Center Location
Error (CLE), Distance Precision at 10 pixels (DP@10), Estimates in Ground Truth (EinGT),
Intersection over Union (IoU), and number of frames with detection. These metrics are
summarized as follows:
• Center Location Error (CLE): This is the error between the center of the bounding box and
the ground-truth bounding box. Smaller means better. CLE is calculated by measuring the
distance between the ground truth center location (𝐶𝑥,𝑔𝑡, 𝐶𝑦,𝑔𝑡) and the detected center
location (𝐶𝑥,𝑒𝑠𝑡, 𝐶𝑦,𝑒𝑠𝑡). Mathematically, CLE is given by
𝐶𝐿𝐸 = √(𝐶𝑥,𝑒𝑠𝑡 − 𝐶𝑥,𝑔𝑡)
2
+ (𝐶𝑦,𝑒𝑠𝑡 − 𝐶𝑦,𝑔𝑡)
2
. (1)
• Distance Precision (DP): This is the percentage of frames where the centroids of detected
bounding boxes are within 10 pixels of the centroid of ground-truth bounding boxes. Close
to 1 or 100% indicates good results.
• Estimates in Ground Truth (EinGT): This is the percentage of the frames where the
centroids of the detected bounding boxes are inside the ground-truth bounding boxes. It
depends on the size of the bounding box and is simply a less strict version of the DP metric.
Close to 1 or 100% indicates good results.
• Intersection over the Union (IoU): It is the ratio of the intersected area over the union of the
estimated and ground truth bounding boxes.
𝐼𝑜𝑈 =
𝐴𝑟𝑒𝑎 𝑜𝑓 𝐼𝑛𝑡𝑒𝑟𝑠𝑒𝑐𝑡𝑖𝑜𝑛
𝐴𝑟𝑒𝑎 𝑜𝑓 𝑈𝑛𝑖𝑜𝑛
(2)
Number of frames with detection: This is the total number of frames that have detection.
5. Signal & Image Processing: An International Journal (SIPIJ) Vol.12, No.3, June 2021
5
2.3. Defense Systems Information Analysis Center (DSIAC) Data
DSIAC is more challenging than MOT [43] because the targets in MOT are large and the image
resolution is also better in MOT. We selected five vehicles in the DSIAC videos for experiments.
There are optical and mid-wave infrared (MWIR) videos collected at distances ranging from
1000 m to 5000 m with 500 m increments. The five types of vehicles are shown in Figure 4.
These videos are challenging for several reasons. First, the target sizes are small due to long
ranges. This is very different from some benchmark datasets such as MOT Challenge [43] where
the range is short and the targets are big. Second, the target orientations also change drastically.
Third, the illuminations in different videos are also different. Fourth, the cameras also move in
some videos.
In this research, MWIR night-time videos were used because MWIR is more effective for
surveillance during the nights than optical videos. The video frame rate is 7 frames/second and
the image size is 640x512. The total number of frames is 1800 per video. All frames are contrast
enhanced using some reference frames in the 1500 m range videos. We also used 16-bit videos in
all of our experiments in this paper due to improved performance of using 16-bit videos in our
earlier paper [23].
(a) (b) (c)
(d) (e)
Figure 4. Five vehicles in DSIAC: (a) BTR70; (b) BRDM2; (c) BMP2; (d) T72; and (e) ZSU23-4.
2.3.1. Further Subsections
Further sub-sectioning, if required, is indicated using 1.1.1. Qqq, etc. headings with 11 pt. bold
Times New Roman font with a 6pt line spacing following.
3. ENHANCING TARGET DETECTION PERFORMANCE USING A NEW
VERSION OF YOLO (YOLOV4)
In our past papers [7]-[10], we used YOLO v3 in our experiments. In the course of our research,
we found that there is a new version of YOLO. The objective of this section is to summarize our
study on how YOLO v4 is better than YOLO v3.
(b) (c) (d)
e) (f) (g)
(b) (c) (d)
(e) (f) (g)
(a) (b) (c)
(e) (f)
(a) (b) (c) (d)
(e) (f) (g)
(a) (b) (c) (d)
(e) (f) (g)
6. Signal & Image Processing: An International Journal (SIPIJ) Vol.12, No.3, June 2021
6
Statistically, the comparison will be done by using several performance metrics. Center Location
Error (CLE) is the average pixel distance of the guessed center of the vehicle bounding box from
the ground truth center location. Distance Precision (DP) is the percentage of detections that fall
within a certain number or pixels of the ground truth center location, for this project that distance
is 20 pixels. Estimated in Ground Truth (EinGT) is the percentage of centroid values that fall
within the ground truth bounding box. Intersection over Union (IoU) is the average area of
intersection between the detected and ground truth bounding box divided by the area of union of
that bounding box, that value is then reduced to a percentage. The final metric, Percent Detection
(%det), is the percentage of frames that have any kind of detection within the frame. Table 1 and
Table 2 contain the metrics used to judge how well the model is trained. The red numbers
indicate the detection metrics using 1500 m videos, which were also used for training. Table 1
has the average results for each distance for YOLO v3 while Table 2 has the average results for
each distance for YOLO v4 trained at the 1500 meter distance. Detection metrics for individual
vehicles at various ranges can be found in Appendix 1.
We used 1500 m videos for training the YOLOs. There are five videos in this range and 1800
frames per video. Comparing the two versions of YOLO shows in general that there is not a huge
difference between the two. For YOLO v4, the first three distances are largely improved in
reference to YOLO v3. Therefore, distances adjacent to the trained distance have a higher degree
of accuracy. However, the further distances, 2500 through 3500 m ranges, have a much steeper
degradation of detection. The accuracy metrics still improve but only slightly and it is not
universally better performing.
Table 1. Average performance metrics for each distance using a 1500 m trained YOLO v3 model. Detailed
metrics for each individual vehicle at various ranges can be found in Table 6 of Appendix 1.
Distance (m) CLE DP EinGT IoU % det.
1000 3.645 100.00% 100.00% 71.79% 94.15%
1500 1.368 100.00% 100.00% 83.99% 100.00%
2000 2.009 99.99% 99.99% 52.24% 90.64%
2500 6.810 98.37% 98.20% 20.14% 44.03%
3000 3.924 100.00% 78.33% 11.54% 11.48%
3500 3.481 100.00% 49.83% 2.94% 6.19%
Avg. 3.540 99.73% 87.73% 40.44% 57.75%
Table 2. Average performance metrics for each distance using a 1500 m trained YOLO v4 model. Detailed
metrics for each individual vehicle at various ranges can be found in Table 7 of Appendix 1.
Distance (m) CLE DP EinGT IoU % det.
1000 2.968 100.0% 100.0% 72.59% 100.0%
1500 1.100 100.0% 99.99% 89.40% 100.0%
2000 1.557 100.0% 100.0% 56.47% 100.0%
2500 8.612 97.43% 97.43% 27.21% 54.33%
3000 2.148 100.0% 100.0% 15.78% 6.32%
3500 1.022 40.00% 24.42% 4.43% 0.76%
Avg. 2.901 89.57% 86.97% 44.31% 50.20%
7. Signal & Image Processing: An International Journal (SIPIJ) Vol.12, No.3, June 2021
7
4. VEHICLE DETECTION PERFORMANCE IMPROVEMENT USING A NEW
TRAINING STRATEGY
In the previous section, we observed that the target detection performance has not improved
much in the long ranges, except in the ranges of 1000 and 2000 m even we used a new version of
YOLO. After analyzing the earlier results, we speculate that the YOLO model trained using a
certain range will be effective only for videos collected close to the range of the training data.
This observation led us to a new training strategy for long range videos.
Here, we summarize a new training strategy for training YOLO for target detection. The goal is
to minimize false positives and missed detections.
4.1. A New Strategy for Target Detection
The reason for this investigation stems from a pattern we see in Table 1 and Table 2. That pattern
is that the 1000 m distance is better performing than the other adjacent distance of 2000 m. The
hypothesis is that the YOLO model is better at accurately detecting the vehicles when the trained
distance is greater than the tested distance. We can see if this is true in the next few tables.
The results for the 2000 m trained model in Table 3 show that clearly the 2000 m distance greatly
improved. This is expected because that is the same distance the model was trained on. However,
there is little degradation in performance to the 1000 m distance as well as a vast improvement to
the 3000 m results. The improvement for 3500 m is much more tepid but it is clear that there is
improved performance across all distances when the YOLO model is trained on the further
distance videos.
Table 3. Average performance metrics for 2000 m YOLO v4 trained model. Detailed metrics for each
individual vehicle at various ranges can be found in Table 8 of Appendix 2. Red numbers
indicate testing results using the same training data.
Distance (m) CLE DP EinGT IoU % det.
1000 3.501 99.9% 100.0% 64.30% 99.6%
1500 2.077 99.88% 99.89% 63.25% 99.6%
2000 1.023 100.0% 100.0% 89.19% 100.0%
2500 13.955 94.75% 94.75% 49.14% 80.44%
3000 1.951 99.5% 99.3% 31.20% 45.40%
3500 1.742 99.67% 94.66% 9.07% 10.90%
Avg. 4.042 98.95% 98.10% 51.03% 72.66%
One anomaly that should be noted is the poor value for CLE at the 2500 m distance. At first, it
looks like an error or poor testing results. Further inspection of the BMP2 video revealed that for
some reason there is a second vehicle driving in circles in the background of that video. Fig. 5
contains a frame of that video, showing both vehicles—the correct BMP2 vehicle in the center of
the frame and another vehicle. YOLO detects the background vehicle and, because all detections
are measured and considered when generating metrics, it greatly affects the CLE results. This
will become very apparent in Table 4 when the training distance is 2500 m.
8. Signal & Image Processing: An International Journal (SIPIJ) Vol.12, No.3, June 2021
8
Figure 5. One frame of 2500 m video of BMP2 vehicle. Red box around background vehicle and
green box around BMP2 vehicle.
Table 4. Average performance metrics for 2500 m YOLO v4 trained model. Detailed metrics for each
individual vehicle at various ranges can be found in Table 9 of Appendix 2. Red numbers
indicate testing results using the same training data.
Distance (m) CLE DP EinGT IoU % det.
1000 3.148 99.9% 100.0% 49.89% 97.3%
1500 2.024 100.0% 100.0% 49.26% 100.0%
2000 1.665 100.0% 100.0% 71.93% 99.5%
2500 1.071 100.0% 100.0% 71.77% 100.0%
3000 1.603 100.0% 100.0% 42.19% 91.42%
3500 1.881 99.79% 94.45% 9.28% 47.29%
Avg. 1.899 99.96% 99.08% 49.05% 89.25%
We continue to see little losses for the closer distances as we move further away in training
distance. There is also a huge improvement in the CLE results in Table 4 for the 2500 m distance.
This was expected because, now that the model is trained on that distance, only the foreground
and relevant vehicle is detected as a possible match. In relation to average statistics, this trained
distance is the best performing distance. While 3500 m video still does not have great detection
results and IoU is still quite poor, each other value is close to the best results we see in other
training distance.
Table 5. Average performance metrics for 3000 m YOLO v4 trained model. Detailed metrics for each
individual vehicle at various ranges can be found in Table 10 of Appendix 2. Red numbers
indicate testing results using the same training data.
Distance (m) CLE DP EinGT IoU % det.
1000 4.071 99.2% 100.0% 27.29% 83.8%
1500 1.954 100.0% 100.0% 30.44% 97.2%
2000 1.522 100.0% 100.0% 49.85% 95.4%
2500 24.089 90.5% 90.5% 65.25% 96.7%
3000 0.950 100.0% 100.0% 80.41% 100.0%
3500 1.330 99.95% 96.20% 23.38% 95.11%
Avg. 5.653 98.28% 97.78% 46.10% 94.7%
9. Signal & Image Processing: An International Journal (SIPIJ) Vol.12, No.3, June 2021
9
There is, however, one more training distance to observe, 3000 meters. In Table 5, we see the
greatest number of detections for any trained model. There is a large decrease in the CLE metric
for 2500 m, 24.089 is by far the worst result for any YOLO model. If the explanation for why
that number is so poor is accepted, then the results for every other value in comparison to each
other trained model is the most consistent. It is not perfect, as seen by the 23% values for IoU,
but it is certainly better performing than most if not all others.
When looking at all distance models trained using YOLO v4, we certainly see a pattern of
improvement as we move the training dataset further and further away. An argument can be made
for which distance is best whether it be the 2500 m or 3000 m model. The biggest negative of the
2500 m model is that the 3500 m videos have an average detection of just 47.29% and the IoU is
rather low. The biggest negative of the 3000 m model is that the CLE for 2500 m is extremely
poor because of the vehicle interfering with the detections for the BMP2 vehicle.
Our study shows that using longer range videos (2500 or 3000 m) to train YOLO can achieve
good detection results in all ranges. A simple explanation for this is that YOLO has built-in data
augmentation capability, which can generate targets with different sizes. In our experiments, it is
clear that YOLO is more capable of generating training images that are larger than the original
images than the opposite case.
4.2. Comparison with a Two-Model Approach
In Section 3 and Section 4.1, we used a single model trained on videos from a single range. One
may wonder what the performance will be if one utilizes a two-model approach, which
essentially uses two separate models trained using videos from two separate ranges.
Here, we first show the detection performance of using two models. One model is obtained by
using videos from the 1500 m range and another model is using videos from the 3000 m range.
The 1500 m model is only used for 1000 to 2000 m ranges and the 3000 m model is only for
2500 m to 3500 m ranges. The combined model results are shown in Table 6. Comparing with the
detection results in Table 5, the two-model performance is indeed better. The cost is that two
models are needed that takes longer time to train.
Table 6. Average performance metrics using two trained models at 1500 m and 3000 m YOLO v4 trained
model. Red numbers indicate testing results using the same training data.
Distance (m) CLE DP EinGT IoU % det.
1000 2.968 100.0% 100.0% 72.59% 100.0%
1500 1.100 100.0% 99.99% 89.40% 100.0%
2000 1.557 100.0% 100.0% 56.47% 100.0%
2500 24.089 90.5% 90.5% 65.25% 96.7%
3000 0.950 100.0% 100.0% 80.41% 100.0%
3500 1.330 99.95% 96.20% 23.38% 95.11%
Avg. 5.333 98.41% 97.78% 64.5% 98.64%
5. CONCLUSIONS AND FUTURE DIRECTIONS
In this paper, we have presented some practical and high performance methods to enhance target
detection performance in long range infrared videos. We observed that using a new version of
YOLO improves slightly the performance. Moreover, we demonstrated that using a new training
10. Signal & Image Processing: An International Journal (SIPIJ) Vol.12, No.3, June 2021
10
strategy, which simply uses longer range videos for training, can significantly improve the
detection performance in all ranges. Finally, we observed that if we used a two-model approach,
the performance will be even better at the cost of requiring more training data and training time.
One future research direction is to investigate target detection based on changes between frames.
Such a detection strategy is useful when there is motion in the targets. Another future direction is
to integrate super-resolution videos with the idea proposed in this paper.
CONFLICT OF INTEREST
The authors declare no conflict of interest.
ACKNOWLEDGMENT
This research was supported by the US Army under contract W909MY-20-P-0024. The views,
opinions and/or findings expressed are those of the author and should not be interpreted as
representing the official views or policies of the Department of Defense or the U.S. Government.
APPENDIX 1: SUPPORTING MATERIALS FOR SECTION 3.1
Table 7. Accuracy statistics for 1500 m trained YOLO v3.
1000 m CLE DP EinGT IoU % det.
BTR70 3.677 100.00% 100.00% 67.89% 99.89%
BRDM2 3.829 100.00% 100.00% 72.02% 92.22%
BMP2 3.762 100.00% 100.00% 70.52% 99.22%
T72 3.638 100.00% 100.00% 72.90% 85.17%
ZSU23-4 3.316 100.00% 100.00% 75.61% 94.28%
Avg 3.645 100.00% 100.00% 71.79% 94.15%
1500 m CLE DP EinGT IoU % det.
BTR70 1.401 100.00% 100.00% 83.00% 100.00%
BRDM2 1.266 100.00% 100.00% 83.16% 100.00%
BMP2 1.293 100.00% 100.00% 86.42% 100.00%
T72 1.491 100.00% 100.00% 85.90% 100.00%
ZSU23-4 1.387 100.00% 100.00% 81.45% 100.00%
Avg 1.368 100.00% 100.00% 83.99% 100.00%
2000 m CLE DP EinGT IoU % det.
BTR70 2.039 100.00% 100.00% 46.21% 91.72%
BRDM2 2.328 100.00% 100.00% 52.79% 99.39%
BMP2 2.005 100.00% 100.00% 59.22% 68.61%
T72 1.467 100.00% 100.00% 51.99% 94.44%
ZSU23-4 2.208 99.95% 99.95% 50.97% 99.06%
Avg 2.009 99.99% 99.99% 52.24% 90.64%
2500 m CLE DP EinGT IoU % det.
BTR70 2.804 99.89% 99.89% 16.85% 36.61%
BRDM2 3.193 100.00% 99.12% 18.85% 71.44%
BMP2 22.030 91.97% 91.97% 21.60% 28.78%
T72 2.978 100.00% 100.00% 24.18% 45.44%
ZSU23-4 3.046 100.00% 100.00% 19.23% 37.89%
Avg 6.810 98.37% 98.20% 20.14% 44.03%
3000 m CLE DP EinGT IoU % det.
BTR70 1.843 100.00% 100.00% 8.44% 10.33%
14. Signal & Image Processing: An International Journal (SIPIJ) Vol.12, No.3, June 2021
14
1500 m CLE DP EinGT IoU % det.
BTR70 1.936 100.0% 100.0% 39.93% 93.6%
BRDM2 1.430 100.0% 100.0% 27.19% 99.6%
BMP2 1.510 100.0% 100.0% 24.40% 94.4%
T72 2.168 100.0% 100.0% 27.67% 100.0%
ZSU23-4 2.724 100.0% 100.0% 33.01% 98.4%
Avg 1.954 100.0% 100.0% 30.44% 97.2%
2000 m CLE DP EinGT IoU % det.
BTR70 1.221 100.0% 100.0% 54.21% 98.3%
BRDM2 1.748 100.0% 100.0% 44.17% 100.0%
BMP2 1.476 100.0% 100.0% 43.86% 91.1%
T72 1.482 100.0% 100.0% 45.70% 100.0%
ZSU23-4 1.683 100.0% 100.0% 61.34% 87.8%
Avg 1.522 100.0% 100.0% 49.85% 95.4%
2500 m CLE DP EinGT IoU % det.
BTR70 1.155 100.0% 100.0% 63.71% 99.8%
BRDM2 1.920 100.0% 100.0% 73.95% 95.4%
BMP2 113.347 52.6% 52.6% 41.28% 100.0%
T72 2.305 100.0% 100.0% 75.16% 99.9%
ZSU23-4 1.720 100.0% 100.0% 72.16% 88.4%
Avg 24.089 90.5% 90.5% 65.25% 96.7%
3000 m CLE DP EinGT IoU % det.
BTR70 0.922 100.0% 100.0% 83.74% 100.00%
BRDM2 1.027 100.0% 100.0% 72.76% 100.00%
BMP2 0.885 100.0% 100.0% 82.95% 100.00%
T72 1.100 100.0% 100.0% 79.87% 100.00%
ZSU23-4 0.817 100.0% 100.0% 82.75% 100.00%
Avg 0.950 100.0% 100.0% 80.41% 100.00%
3500 m CLE DP EinGT IoU % det.
BTR70 1.047 100.0% 99.1% 21.36% 99.61%
BRDM2 1.535 99.73% 99.73% 23.91% 97.39%
BMP2 1.036 100.0% 100.00% 28.05% 96.17%
T72 1.568 100.0% 98.38% 20.80% 87.83%
ZSU23-4 1.463 100.0% 83.78% 22.77% 94.56%
Avg 1.330 99.95% 96.20% 23.38% 95.11%
REFERENCES
[1] C. Kwan, B. Chou, and L. M. Kwan, “A Comparative Study of Conventional and Deep Learning
Target Tracking Algorithms for Low Quality Videos,” 15th International Symposium on Neural
Networks, 2018; DOI: 10.1007/978-3-319-92537-0_60.
[2] L. Bertinetto, et al., “Staple: Complementary Learners for Real-Time Tracking,” In CVPR. 2016.
[3] C. Ma, X. Yang, C. Zhang, and M. H. Yang, “Long-term correlation tracking,” 2015 IEEE
Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, pp. 5388-5396,
2015.
[4] Z. Kalal, K. Mikolajczyk, and J. Matas, “Tracking-Learning-Detection,” TPAMI, 34(7), 2012.
[5] C. Stauffer and W. E. L. Grimson, “Adaptive Background Mixture Models for Real-Time Tracking,
Computer Vision and Pattern Recognition,” IEEE Computer Society Conference on Computer Vision
and Pattern Recognition, Vol. 2, pp. 2246-252, 1999.
[6] H. S. Demir and A. E. Cetin, “Co-difference based object tracking algorithm for infrared videos,”
IEEE International Conference on Image Processing (ICIP), Phoenix, AZ, 2016, pp. 434-438
[7] C. Kwan, D. Gribben, and T. Tran, “Tracking and Classification of Multiple Human Objects Directly
in Compressive Measurement Domain for Low Quality Optical Videos,” IEEE Ubiquitous
Computing, Electronics & Mobile Communication Conference, New York City. 2019.
15. Signal & Image Processing: An International Journal (SIPIJ) Vol.12, No.3, June 2021
15
[8] C. Kwan, B. Chou, J. Yang, and T. Tran, “Deep Learning based Target Tracking and Classification
Directly in Compressive Measurement for Low Quality Videos,” Signal & Image Processing: An
International Journal (SIPIJ), November 16, 2019.
[9] C. Kwan, D. Gribben, A. Rangamani, T. Tran, J. Zhang, R. Etienne-Cummings, “Detection and
Confirmation of Multiple Human Targets Using Pixel-Wise Code Aperture Measurements,” J.
Imaging. 6(6), 40, 2020.
[10] C. Kwan, B. Chou, J. Yang, and T. Tran, “Deep Learning based Target Tracking and Classification
for Infrared Videos Using Compressive Measurements,” Journal Signal and Information Processing,
November 2019.
[11] S. Lohit, K. Kulkarni, and P. K. Turaga, “Direct inference on compressive measurements using
convolutional neural networks,” Int. Conference on Image Processing. 2016. 1913-1917.
[12] A. Adler, M. Elad, and M. Zibulevsky, “Compressed Learning: A Deep Neural Network Approach,”
arXiv:1610.09615v1 [cs.CV]. 2016.
[13] Y. Xu and K. F. Kelly, “Compressed domain image classification using a multi-rate neural network,”
arXiv:1901.09983 [cs.CV]. 2019.
[14] Z. W. Wang, V. Vineet, F. Pittaluga, S. N. Sinha, O. Cossairt, and S. B. Kang, “Privacy-Preserving
Action Recognition Using Coded Aperture Videos,” IEEE Conference on Computer Vision and
Pattern Recognition (CVPR) Workshops. 2019.
[15] H. Vargas, Y. Fonseca, and H. Arguello, “Object Detection on Compressive Measurements using
Correlation Filters and Sparse Representation,” 26th European Signal Processing Conference
(EUSIPCO). 1960-1964, 2018.
[16] A. Değerli, S. Aslan, M. Yamac, B. Sankur, and M. Gabbouj, “Compressively Sensed Image
Recognition,” 7th European Workshop on Visual Information Processing (EUVIP), Tampere, 2018.
[17] P. Latorre-Carmona, V. J. Traver, J. S. Sánchez, and E. Tajahuerce, “Online reconstruction-free
single-pixel image classification,” Image and Vision Computing, 86, 2018.
[18] C. Li and W. Wang, “Detection and Tracking of Moving Targets for Thermal Infrared Video
Sequences,” Sensors, 18, 3944, 2018.
[19] Y. Tan, Y. Guo, and C. Gao, “Background subtraction based level sets for human segmentation in
thermal infrared surveillance systems,” Infrared Phys. Technol., 61: 230–240, 2013.
[20] A. Berg, J. Ahlberg, and M. Felsberg, “Channel Coded Distribution Field Tracking for Thermal
Infrared Imagery,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
Workshops; Las Vegas, NV, USA. pp. 1248–1256, 2016.
[21] C. Kwan, D. Gribben, B. Chou, B. Budavari, J. Larkin, A. Rangamani, T. Tran, J. Zhang, R. Etienne-
Cummings, “Real-Time and Deep Learning based Vehicle Detection and Classification using Pixel-
Wise Code Exposure Measurements,” Electronics, June 18, 2020.
[22] C. Kwan, D. Gribben, and B. Budavari, “Target Detection and Classification Performance
Enhancement Using Super-Resolution Infrared Videos,” Signal & Image Processing: An International
Journal (SIPIJ), vol. 12, no. 2, April 30, 2021.
[23] C. Kwan and D. Gribben, “Target Detection and Classification Improvements Using Contrast
Enhanced 16-bit Infrared Videos,” Signal & Image Processing: An International Journal (SIPIJ), vol.
12, no. 1, February 28, 2021. DOI: 10.5121/sipij.2021.12103.
[24] C. Kwan and B. Budavari, “Enhancing Small Moving Target Detection Performance in Low Quality
and Long Range Infrared Videos Using Optical Flow Techniques,” Remote Sensing, 12(24), 4024,
December 9, 2020.
[25] Y. Chen, G. Zhang, Y. Ma, J. U. Kang, and C. Kwan, “Small Infrared Target Detection based on Fast
Adaptive Masking and Scaling with Iterative Segmentation,” IEEE Geoscience and Remote Sensing
Letters, January 2021.
[26] C. Kwan and B. Budavari, “A High Performance Approach to Detecting Small Targets in Long
Range Low Quality Infrared Videos,” arXiv:2012.02579, 2020.
[27] X. Kong, C. Yang, S. Cao, C. Li and Z. Peng, “Infrared Small Target Detection via Nonconvex
Tensor Fibered Rank Approximation,” in IEEE Transactions on Geoscience and Remote Sensing,
doi: 10.1109/TGRS.2021.3068465.
[28] D. Ma, L. Dong and W. Xu, “A Method for Infrared Sea-Sky Condition Judgment and Search
System: Robust Target Detection via PLS and CEDoG,” in IEEE Access, vol. 9, pp. 1439-1453,
2021, doi: 10.1109/ACCESS.2020.3047736.
16. Signal & Image Processing: An International Journal (SIPIJ) Vol.12, No.3, June 2021
16
[29] Y. Dai, Y. Wu, F. Zhou and K. Barnard, “Attentional Local Contrast Networks for Infrared Small
Target Detection,” in IEEE Transactions on Geoscience and Remote Sensing, doi:
10.1109/TGRS.2020.3044958.
[30] P. Yang, L. Dong and W. Xu, “Infrared Small Maritime Target Detection Based on Integrated Target
Saliency Measure,” in IEEE Journal of Selected Topics in Applied Earth Observations and Remote
Sensing, vol. 14, pp. 2369-2386, 2021, doi: 10.1109/JSTARS.2021.3049847.
[31] D. Pang, T. Shan, P. Ma, W. Li, S. Liu and R. Tao, “A Novel Spatiotemporal Saliency Method for
Low-Altitude Slow Small Infrared Target Detection,” in IEEE Geoscience and Remote Sensing
Letters, doi: 10.1109/LGRS.2020.3048199.
[32] Q. Hou, Z. Wang, F. Tan, Y. Zhao, H. Zheng and W. Zhang, “RISTDnet: Robust Infrared Small
Target Detection Network,” in IEEE Geoscience and Remote Sensing Letters, doi:
10.1109/LGRS.2021.3050828.
[33] S. Du, P. Zhang, B. Zhang and H. Xu, “Weak and Occluded Vehicle Detection in Complex Infrared
Environment Based on Improved YOLOv4,” in IEEE Access, vol. 9, pp. 25671-25680, 2021, doi:
10.1109/ACCESS.2021.3057723.
[34] Z. Song, J. Yang, D. Zhang, S. Wang and Z. Li, “Semi-Supervised Dim and Small Infrared Ship
Detection Network Based on Haar Wavelet,” in IEEE Access, vol. 9, pp. 29686-29695, 2021, doi:
10.1109/ACCESS.2021.3058526.
[35] M. Wan, X. Ye, X. Zhang, Y. Xu, G. Gu and Q. Chen, “Infrared Small Target Tracking via Gaussian
Curvature-Based Compressive Convolution Feature Extraction,” in IEEE Geoscience and Remote
Sensing Letters, doi: 10.1109/LGRS.2021.3051183.
[36] M. Zhao, W. Li, L. Li, P. Ma, Z. Cai and R. Tao, “Three-Order Tensor Creation and Tucker
Decomposition for Infrared Small-Target Detection,” in IEEE Transactions on Geoscience and
Remote Sensing, doi: 10.1109/TGRS.2021.3057696.
[37] H. Sun et al., “Fusion of Infrared and Visible Images for Remote Detection of Low-Altitude Slow-
Speed Small Targets,” in IEEE Journal of Selected Topics in Applied Earth Observations and Remote
Sensing, vol. 14, pp. 2971-2983, 2021, doi: 10.1109/JSTARS.2021.3061496.
[38] A. Raza et al., “IR-MSDNet: Infrared and Visible Image Fusion Based On Infrared Features and
Multiscale Dense Network,” in IEEE Journal of Selected Topics in Applied Earth Observations and
Remote Sensing, vol. 14, pp. 3426-3437, 2021, doi: 10.1109/JSTARS.2021.3065121.
[39] W. Xue, J. Qi, G. Shao, Z. Xiao, Y. Zhang and P. Zhong, “Low-Rank Approximation and Multiple
Sparse Constraints Modelling for Infrared Low-Flying Fixed-Wing UAV Detection,” in IEEE Journal
of Selected Topics in Applied Earth Observations and Remote Sensing, doi:
10.1109/JSTARS.2021.3069032.
[40] J. Redmon and A. Farhadi, YOLOv3: An Incremental Improvement, arxiv, 2018.
[41] S. Ren, K., He, R. Girshick, and J. Sun, “Faster R-CNN: Towards Real-Time Object Detection with
Region Proposal Networks,” Advances in Neural Information Processing Systems, (2015)
[42] A. Bochkovskiy, C.-Y. Wang, and H.-Y. M. Liao, “YOLOv4: Optimal Speed and Accuracy of Object
Detection,” arXiv preprint arXiv:2004.10934, 2020.
[43] MOT Challenge, motchallenge.net/, accessed May 18, 2021.
AUTHORS
Chiman Kwan received his Ph.D. degree in electrical engineering from the University of Texas at
Arlington in 1993. He has written one book, four book chapters, 15 patents, 75 invention disclosures, 380
technical papers in journals and conferences, and 550 technical reports. Over the past 25 years, he has been
the PI/Program Manager of over 120 diverse projects with total funding exceeding 36 million dollars. He is
also the founder and Chief Technology Officer of Signal Processing, Inc. and Applied Research LLC. He
received numerous awards from IEEE, NASA, and some other agencies and has given keynote speeches in
several international conferences. He is an Associate Editor of IEEE Trans. Geoscience and Remote
Sensing and a Japan Society for Promotion of Science (JSPS) Fellow.
David Gribben received his B.S. in Computer Science and Physics from McDaniel College, Maryland,
USA, in 2015. He is a software engineer at ARLLC. He has been involved in diverse projects, including
mission planning for UAVs, target detection and classification, and remote sensing.