This document provides a survey of video frame interpolation techniques using deep learning. It discusses benchmark datasets commonly used to evaluate methods, including UCF101, Middlebury, and Vimeo-90K. Two main categories of methods are covered: kernel-based methods that estimate pixel-wise convolution kernels to generate interpolated frames, and flow-based methods that rely on optical flow estimation. Recent works that use adaptive separable convolutions, deformable convolutions, and GANs are summarized. The survey concludes that while kernel-based methods have improved, flow-based techniques still achieve more natural results, and combining frame interpolation with GANs shows promise.
AN EFFICIENT MODEL FOR VIDEO PREDICTIONIRJET Journal
This document proposes a lightweight neural network model for video prediction. It introduces depthwise and pointwise 3D convolutions to reduce computational cost and memory usage compared to traditional convolutional models. The proposed model uses an encoder-decoder architecture with depthwise and pointwise convolutional blocks in both the encoder and decoder. The encoder extracts spatiotemporal features from input video frames, and the decoder generates predicted future frames from the encoded features. Experimental results on standard datasets show the proposed model outperforms state-of-the-art methods in terms of prediction accuracy and efficiency.
AN EFFICIENT MODEL FOR VIDEO PREDICTIONIRJET Journal
This document proposes a lightweight neural network model for video prediction. It introduces depthwise and pointwise 3D convolutions to reduce computational cost and memory usage compared to traditional convolutional models. The proposed model uses an encoder-decoder architecture with depthwise and pointwise convolutional blocks in both the encoder and decoder. The encoder extracts spatiotemporal features from input video frames, and the decoder generates predicted future frames from the encoded features. Experimental results on standard datasets show the proposed model outperforms state-of-the-art methods in terms of prediction accuracy and efficiency.
Semantic Concept Detection in Video Using Hybrid Model of CNN and SVM Classif...CSCJournals
In today's era of digitization and fast internet, many video are uploaded on websites, a mechanism is required to access this video accurately and efficiently. Semantic concept detection achieve this task accurately and is used in many application like multimedia annotation, video summarization, annotation, indexing and retrieval. Video retrieval based on semantic concept is efficient and challenging research area. Semantic concept detection bridges the semantic gap between low level extraction of features from key-frame or shot of video and high level interpretation of the same as semantics. Semantic Concept detection automatically assigns labels to video from predefined vocabulary. This task is considered as supervised machine learning problem. Support vector machine (SVM) emerged as default classifier choice for this task. But recently Deep Convolutional Neural Network (CNN) has shown exceptional performance in this area. CNN requires large dataset for training. In this paper, we present framework for semantic concept detection using hybrid model of SVM and CNN. Global features like color moment, HSV histogram, wavelet transform, grey level co-occurrence matrix and edge orientation histogram are selected as low level features extracted from annotated groundtruth video dataset of TRECVID. In second pipeline, deep features are extracted using pretrained CNN. Dataset is partitioned in three segments to deal with data imbalance issue. Two classifiers are separately trained on all segments and fusion of scores is performed to detect the concepts in test dataset. The system performance is evaluated using Mean Average Precision for multi-label dataset. The performance of the proposed framework using hybrid model of SVM and CNN is comparable to existing approaches.
Design and Analysis of Quantization Based Low Bit Rate Encoding Systemijtsrd
This document summarizes research on developing a low bit rate encoding system for video compression using vector quantization. It first discusses how vector quantization can achieve high compression ratios and has been used widely in image and speech coding. It then describes the methodology used, which involves taking video frames as input, downsampling the frames to extract pixels, applying vector quantization, and detecting edges on the compressed frames to check compression quality. Finally, it discusses the results of testing the approach on MATLAB and presents conclusions on the advantages of the proposed algorithm for very low bit rate video coding applications.
The advents in this technological era have resulted into enormous pool of information. This information is
stored at multiple places globally, in multiple formats. This article highlights a methodology for extracting
the video lectures delivered by experts in the domain of Computer Science by using Generalized Gamma
Mixture Model. The feature extraction is based on the DCT transformations. In order to propose the model,
the data set is pooled from the YouTube video lectures in the domain of Computer Science. The outputs
generated are evaluated using Precision and Recall.
Block matching algorithm (BMA) for motion estimation is extremely normally utilized in current video coding standard like H.26x and MPEG-x as a result of its simplicity and performance and also it is a very important content in video compression the motion estimation is becoming a problem in many video applications as it to estimate the motion of the object. There are homography between 2 frames within the video sequences captured by pan-tilt (PT) cameras in their unnatural movements and therefore the geometric relationship is used to reduce the spatial redundancy in the video. In this paper, I present a homography based motion estimation algorithm and a comparative study of different algorithms. Also I introduce a unique homography-based motion for block motion estimation. This study is to provide an idea about the important tradeoff between computational complexity, result quality and various applications. This algorithm can be done on Matlab.
A New Approach for video denoising and enhancement using optical flow EstimationIRJET Journal
This document proposes a new approach for video denoising and enhancement using optical flow estimation. It discusses using motion compensation via optical flow estimation along with principal component analysis (PCA) to provide fine video details. However, PCA has limitations in fully eliminating noise. The proposed method aims to replace PCA with wavelet transformation, which provides multi-resolution analysis and sparsity advantages for better denoising results in terms of PSNR and RMSE compared to PCA. It involves estimating optical flow between frames for motion compensation before applying wavelet transformation for noise removal and video reconstruction.
Image super resolution using Generative Adversarial Network.IRJET Journal
This document discusses using a generative adversarial network (GAN) for image super resolution. It begins with an abstract that explains super resolution aims to increase image resolution by adding sub-pixel detail. Convolutional neural networks are well-suited for this task. Recent years have seen interest in reconstructing super resolution video sequences from low resolution images. The document then reviews literature on image super resolution techniques including deep learning methods. It describes the methodology which uses a CNN to compare input images to a trained dataset to predict if high-resolution images can be generated from low-resolution images.
AN EFFICIENT MODEL FOR VIDEO PREDICTIONIRJET Journal
This document proposes a lightweight neural network model for video prediction. It introduces depthwise and pointwise 3D convolutions to reduce computational cost and memory usage compared to traditional convolutional models. The proposed model uses an encoder-decoder architecture with depthwise and pointwise convolutional blocks in both the encoder and decoder. The encoder extracts spatiotemporal features from input video frames, and the decoder generates predicted future frames from the encoded features. Experimental results on standard datasets show the proposed model outperforms state-of-the-art methods in terms of prediction accuracy and efficiency.
AN EFFICIENT MODEL FOR VIDEO PREDICTIONIRJET Journal
This document proposes a lightweight neural network model for video prediction. It introduces depthwise and pointwise 3D convolutions to reduce computational cost and memory usage compared to traditional convolutional models. The proposed model uses an encoder-decoder architecture with depthwise and pointwise convolutional blocks in both the encoder and decoder. The encoder extracts spatiotemporal features from input video frames, and the decoder generates predicted future frames from the encoded features. Experimental results on standard datasets show the proposed model outperforms state-of-the-art methods in terms of prediction accuracy and efficiency.
Semantic Concept Detection in Video Using Hybrid Model of CNN and SVM Classif...CSCJournals
In today's era of digitization and fast internet, many video are uploaded on websites, a mechanism is required to access this video accurately and efficiently. Semantic concept detection achieve this task accurately and is used in many application like multimedia annotation, video summarization, annotation, indexing and retrieval. Video retrieval based on semantic concept is efficient and challenging research area. Semantic concept detection bridges the semantic gap between low level extraction of features from key-frame or shot of video and high level interpretation of the same as semantics. Semantic Concept detection automatically assigns labels to video from predefined vocabulary. This task is considered as supervised machine learning problem. Support vector machine (SVM) emerged as default classifier choice for this task. But recently Deep Convolutional Neural Network (CNN) has shown exceptional performance in this area. CNN requires large dataset for training. In this paper, we present framework for semantic concept detection using hybrid model of SVM and CNN. Global features like color moment, HSV histogram, wavelet transform, grey level co-occurrence matrix and edge orientation histogram are selected as low level features extracted from annotated groundtruth video dataset of TRECVID. In second pipeline, deep features are extracted using pretrained CNN. Dataset is partitioned in three segments to deal with data imbalance issue. Two classifiers are separately trained on all segments and fusion of scores is performed to detect the concepts in test dataset. The system performance is evaluated using Mean Average Precision for multi-label dataset. The performance of the proposed framework using hybrid model of SVM and CNN is comparable to existing approaches.
Design and Analysis of Quantization Based Low Bit Rate Encoding Systemijtsrd
This document summarizes research on developing a low bit rate encoding system for video compression using vector quantization. It first discusses how vector quantization can achieve high compression ratios and has been used widely in image and speech coding. It then describes the methodology used, which involves taking video frames as input, downsampling the frames to extract pixels, applying vector quantization, and detecting edges on the compressed frames to check compression quality. Finally, it discusses the results of testing the approach on MATLAB and presents conclusions on the advantages of the proposed algorithm for very low bit rate video coding applications.
The advents in this technological era have resulted into enormous pool of information. This information is
stored at multiple places globally, in multiple formats. This article highlights a methodology for extracting
the video lectures delivered by experts in the domain of Computer Science by using Generalized Gamma
Mixture Model. The feature extraction is based on the DCT transformations. In order to propose the model,
the data set is pooled from the YouTube video lectures in the domain of Computer Science. The outputs
generated are evaluated using Precision and Recall.
Block matching algorithm (BMA) for motion estimation is extremely normally utilized in current video coding standard like H.26x and MPEG-x as a result of its simplicity and performance and also it is a very important content in video compression the motion estimation is becoming a problem in many video applications as it to estimate the motion of the object. There are homography between 2 frames within the video sequences captured by pan-tilt (PT) cameras in their unnatural movements and therefore the geometric relationship is used to reduce the spatial redundancy in the video. In this paper, I present a homography based motion estimation algorithm and a comparative study of different algorithms. Also I introduce a unique homography-based motion for block motion estimation. This study is to provide an idea about the important tradeoff between computational complexity, result quality and various applications. This algorithm can be done on Matlab.
A New Approach for video denoising and enhancement using optical flow EstimationIRJET Journal
This document proposes a new approach for video denoising and enhancement using optical flow estimation. It discusses using motion compensation via optical flow estimation along with principal component analysis (PCA) to provide fine video details. However, PCA has limitations in fully eliminating noise. The proposed method aims to replace PCA with wavelet transformation, which provides multi-resolution analysis and sparsity advantages for better denoising results in terms of PSNR and RMSE compared to PCA. It involves estimating optical flow between frames for motion compensation before applying wavelet transformation for noise removal and video reconstruction.
Image super resolution using Generative Adversarial Network.IRJET Journal
This document discusses using a generative adversarial network (GAN) for image super resolution. It begins with an abstract that explains super resolution aims to increase image resolution by adding sub-pixel detail. Convolutional neural networks are well-suited for this task. Recent years have seen interest in reconstructing super resolution video sequences from low resolution images. The document then reviews literature on image super resolution techniques including deep learning methods. It describes the methodology which uses a CNN to compare input images to a trained dataset to predict if high-resolution images can be generated from low-resolution images.
The document summarizes a research paper that proposes a method to summarize parking surveillance footage. The method first pre-processes the raw footage to extract only frames containing vehicles. These frames are then classified using a CNN model to detect vehicles and recognize license plates. The classified objects and license plate numbers are used to generate a textual summary of the vehicles in the footage, making it easier for users to review large amounts of surveillance video. The paper discusses related work on video summarization techniques and provides details of the proposed methodology, which includes preprocessing footage, extracting features from frames containing vehicles, using CNNs for object detection and license plate recognition, and generating a summarized video and text report.
This document describes a proposed method for real-time object detection using Single Shot Multi-Box Detection (SSD) with the MobileNet model. SSD is a single, unified network for object detection that eliminates feature resampling and combines predictions. MobileNet is used to create a lightweight network by employing depthwise separable convolutions, which significantly reduces model size compared to regular convolutions. The proposed SSD with MobileNet model achieved improved accuracy in identifying real-time household objects while maintaining the detection speed of SSD.
Leading water utility company in USA was facing a challenge to improve pipeline inspection process to reduce human errors and manual inspection time.Pipeline Anomaly Detection automates the process of identification of defects in pipeline videos, by a camera which notes the observations and lastly it generates the report.
Residual balanced attention network for real-time traffic scene semantic segm...IJECEIAES
Intelligent transportation systems (ITS) are among the most focused research in this century. Actually, autonomous driving provides very advanced tasks in terms of road safety monitoring which include identifying dangers on the road and protecting pedestrians. In the last few years, deep learning (DL) approaches and especially convolutional neural networks (CNNs) have been extensively used to solve ITS problems such as traffic scene semantic segmentation and traffic signs classification. Semantic segmentation is an important task that has been addressed in computer vision (CV). Indeed, traffic scene semantic segmentation using CNNs requires high precision with few computational resources to perceive and segment the scene in real-time. However, we often find related work focusing only on one aspect, the precision, or the number of computational parameters. In this regard, we propose RBANet, a robust and lightweight CNN which uses a new proposed balanced attention module, and a new proposed residual module. Afterward, we have simulated our proposed RBANet using three loss functions to get the best combination using only 0.74M parameters. The RBANet has been evaluated on CamVid, the most used dataset in semantic segmentation, and it has performed well in terms of parameters’ requirements and precision compared to related work.
An Analysis of Various Deep Learning Algorithms for Image Processingvivatechijri
Various applications of image processing has given it a wider scope when it comes to data analysis.
Various Machine Learning Algorithms provide a powerful environment for training modules effectively to
identify various entities of images and segment the same accordingly. Rather one can observe that though the
image classifiers like the Support Vector Machines (SVM) or Random Forest Algorithms do justice to the task,
deep learning algorithms like the Artificial Neural Networks (ANN) and its subordinates, the very well-known
and extremely powerful Algorithm Convolution Neural Networks (CNN) can provide a new dimension to the
image processing domain. It has way higher accuracy and computational power for classifying images further
and segregating their various entities as individual components of the image working region. Major focus will
be on the Region Convolution Neural Networks (R-CNN) algorithm and how well it provides the pixel-level
segmentation further using its better successors like the Fast-Faster and Mask R-CNN versions.
IRJET-Multiple Object Detection using Deep Neural NetworksIRJET Journal
The document describes a system for detecting multiple objects in videos using deep convolutional neural networks. The system first uses a Region Proposal Network to generate candidate object regions in each frame. It then applies a convolutional neural network to the full frame to extract features, and uses those features to classify and refine the bounding boxes for each proposed region. To improve detection across frames, the system also analyzes results from consecutive frames using a post-processing algorithm. The goal is to enhance confidence for consistently detected objects over time. Evaluation shows the approach effectively detects multiple objects in scenes from video frames.
Secure IoT Systems Monitor Framework using Probabilistic Image EncryptionIJAEMSJORNAL
In recent years, the modeling of human behaviors and patterns of activity for recognition or detection of special events has attracted considerable research interest. Various methods abounding to build intelligent vision systems aimed at understanding the scene and making correct semantic inferences from the observed dynamics of moving targets. Many systems include detection, storage of video information, and human-computer interfaces. Here we present not only an update that expands previous similar surveys but also a emphasis on contextual abnormal detection of human activity , especially in video surveillance applications. The main purpose of this survey is to identify existing methods extensively, and to characterize the literature in a manner that brings to attention key challenges.
Review On Different Feature Extraction AlgorithmsIRJET Journal
This document discusses different feature extraction algorithms that can be used in visual sensor networks (WVSN). It begins by explaining the traditional "compress-then-analyze" approach and introduces the newer "analyze-then-compress" approach. The document then reviews several real-valued and binary feature extraction algorithms such as SIFT, SURF, BRIEF, BRISK, and BinBoost. It proposes a video coding system that uses the BRISK algorithm to extract features, encodes the features using entropy coding, and transmits the data to a base station for matching with stored image data. Diagrams of the proposed system and the BRISK algorithm are also included.
This document discusses human action recognition in videos using two deep learning models: Long-Term Recurrent Convolutional Networks (LRCN) and Convolutional Long Short-Term Memory (ConvLSTM). LRCN combines convolutional and LSTM layers to analyze video frame sequences, while ConvLSTM integrates convolutional operations within LSTM cells. The document aims to compare the performance of these two models on the UCF101 action recognition dataset. It provides background on related work, describes the proposed methodology including the LRCN and ConvLSTM models, and outlines the experimental procedure to evaluate and compare the two approaches on video classification tasks.
IGeekS Technologies is a company located in Bangalore, India. We have being recognized as a quality provider of hardware and software solutions for the student’s in order carry out their academic Projects. We offer academic projects at various academic levels ranging from graduates to masters (Diploma, BCA, BE, M. Tech, MCA, M. Sc (CS/IT)). As a part of the development training, we offer Projects in Embedded Systems & Software to the Engineering College students in all major disciplines.
Key Frame Extraction for Salient Activity RecognitionSuhas Pillai
This document proposes a key frame extraction method for efficient and accurate activity recognition in videos using deep learning. It introduces a multi-stream convolutional neural network architecture that takes in multiple representations of video data as input, including different color spaces (RGB, YCbCr) and optical flow. Key frames are extracted from videos based on optical flow, representing important moments while reducing computational requirements compared to using all frames. The method is evaluated on the UCF-101 dataset and shows promising results for video classification using different combinations of color spaces and optical flow as input to the multi-stream CNN model.
Deep learning-based switchable network for in-loop filtering in high efficie...IJECEIAES
The video codecs are focusing on a smart transition in this era. A future area of research that has not yet been fully investigated is the effect of deep learning on video compression. The paper’s goal is to reduce the ringing and artifacts that loop filtering causes when high-efficiency video compression is used. Even though there is a lot of research being done to lessen this effect, there are still many improvements that can be made. In This paper we have focused on an intelligent solution for improvising in-loop filtering in high efficiency video coding (HEVC) using a deep convolutional neural network (CNN). The paper proposes the design and implementation of deep CNN-based loop filtering using a series of 15 CNN networks followed by a combine and squeeze network that improves feature extraction. The resultant output is free from double enhancement and the peak signal-to-noise ratio is improved by 0.5 dB compared to existing techniques. The experiments then demonstrate that improving the coding efficiency by pipelining this network to the current network and using it for higher quantization parameters (QP) is more effective than using it separately. Coding efficiency is improved by an average of 8.3% with the switching based deep CNN in-loop filtering.
The International Journal of Engineering and Science (The IJES)theijes
This document summarizes and compares different techniques for moving object detection in video surveillance systems. It discusses background subtraction, background estimation, and adaptive contrast change detection methods. It finds that while traditional methods work for single objects, correlation between frames performs better for multiple objects or poor lighting conditions, as it detects changes between frames. The document evaluates several algorithms and concludes correlation significantly improves output and performance even with multiple moving objects, making it suitable for night-time surveillance applications.
A Video Processing based System for Counting VehiclesIRJET Journal
This document describes a video processing system for counting vehicles. The system processes video frames using discrete wavelet transform (DWT) features and a neural network. In the first phase, vehicle images are extracted from videos and used to train a backpropagation neural network to detect vehicles based on DWT features. In the testing phase, video frames are extracted and the DWT features of frames showing the detection point are input to the neural network to detect vehicles. The system was tested on videos and achieved satisfactory counting accuracy ranging from 97.9-100%. The system provides an effective way to count vehicles for applications like traffic analysis.
IRJET- Analysis of Vehicle Number Plate RecognitionIRJET Journal
This document summarizes research on techniques for enhancing license plate images and improving vehicle number plate recognition. It discusses using forward motion deblurring, scale-based region growing, blur estimation, blind image deblurring, and a no-reference metric to enhance license plate image details and obtain deblurred images. The document also reviews related work applying these and other techniques, such as binary SIFT, energy cooperation in wireless networks, and parametric blur estimation for natural image restoration. The goal is to develop more accurate and robust methods for automatic license plate recognition.
IRJET- Semantic Segmentation using Deep LearningIRJET Journal
The document discusses semantic image segmentation using deep learning techniques. It summarizes several state-of-the-art semantic segmentation models like U-Net, Dilated U-Net, PSPNet, Fully Convolutional DenseNets, Global Convolutional Network (GCN), DeepLabV3, and proposes an optimized FRRN model. It implements these models on the CamVid dataset and evaluates their performance using the intersection-over-union score, finding that the optimized FRRN model achieves a score of 0.87.
A Review on Natural Scene Text Understanding for Computer Vision using Machin...IRJET Journal
This document reviews various machine learning techniques for natural scene text understanding in computer vision. It discusses how deep learning methods like convolutional neural networks (CNNs) are more accurate than traditional image processing for text detection and recognition in complex, natural images. The document also summarizes several papers that propose different models using techniques like recurrent neural networks (RNNs), feature pyramid networks (FPNs), attention mechanisms, and connectionist temporal classification to detect and recognize text in scene images with challenges like variations in font, orientation, lighting and background complexity.
Human Action Recognition Using Deep LearningIRJET Journal
This document discusses human action recognition using deep learning models. It proposes using two deep learning models - Convolutional Neural Networks (CNN) and Long-term Recurrent Convolutional Networks (LRCN) - to recognize human actions in videos. The Kinetics dataset is used to train and evaluate the models. Results show that both CNN and LRCN are able to accurately recognize human actions like playing piano or archery in test videos. The LRCN model achieves slightly higher accuracy compared to the traditional two-stream CNN method.
This document provides an overview of a project that implemented image filtering using VHDL on an FPGA board. It discusses designing filters like average, Sobel, Gaussian, and Laplacian filters. Cache memory and a processing unit were developed to hold pixel values and apply filter kernels. Different methods for multiplication in the convolution process were evaluated. Results showed the output images after applying each filter both in software and on the FPGA board. In conclusion, FPGAs provide reconfigurable, accelerated processing for image applications like filtering compared to general purpose computers.
The document summarizes a research paper that proposes a method to summarize parking surveillance footage. The method first pre-processes the raw footage to extract only frames containing vehicles. These frames are then classified using a CNN model to detect vehicles and recognize license plates. The classified objects and license plate numbers are used to generate a textual summary of the vehicles in the footage, making it easier for users to review large amounts of surveillance video. The paper discusses related work on video summarization techniques and provides details of the proposed methodology, which includes preprocessing footage, extracting features from frames containing vehicles, using CNNs for object detection and license plate recognition, and generating a summarized video and text report.
This document describes a proposed method for real-time object detection using Single Shot Multi-Box Detection (SSD) with the MobileNet model. SSD is a single, unified network for object detection that eliminates feature resampling and combines predictions. MobileNet is used to create a lightweight network by employing depthwise separable convolutions, which significantly reduces model size compared to regular convolutions. The proposed SSD with MobileNet model achieved improved accuracy in identifying real-time household objects while maintaining the detection speed of SSD.
Leading water utility company in USA was facing a challenge to improve pipeline inspection process to reduce human errors and manual inspection time.Pipeline Anomaly Detection automates the process of identification of defects in pipeline videos, by a camera which notes the observations and lastly it generates the report.
Residual balanced attention network for real-time traffic scene semantic segm...IJECEIAES
Intelligent transportation systems (ITS) are among the most focused research in this century. Actually, autonomous driving provides very advanced tasks in terms of road safety monitoring which include identifying dangers on the road and protecting pedestrians. In the last few years, deep learning (DL) approaches and especially convolutional neural networks (CNNs) have been extensively used to solve ITS problems such as traffic scene semantic segmentation and traffic signs classification. Semantic segmentation is an important task that has been addressed in computer vision (CV). Indeed, traffic scene semantic segmentation using CNNs requires high precision with few computational resources to perceive and segment the scene in real-time. However, we often find related work focusing only on one aspect, the precision, or the number of computational parameters. In this regard, we propose RBANet, a robust and lightweight CNN which uses a new proposed balanced attention module, and a new proposed residual module. Afterward, we have simulated our proposed RBANet using three loss functions to get the best combination using only 0.74M parameters. The RBANet has been evaluated on CamVid, the most used dataset in semantic segmentation, and it has performed well in terms of parameters’ requirements and precision compared to related work.
An Analysis of Various Deep Learning Algorithms for Image Processingvivatechijri
Various applications of image processing has given it a wider scope when it comes to data analysis.
Various Machine Learning Algorithms provide a powerful environment for training modules effectively to
identify various entities of images and segment the same accordingly. Rather one can observe that though the
image classifiers like the Support Vector Machines (SVM) or Random Forest Algorithms do justice to the task,
deep learning algorithms like the Artificial Neural Networks (ANN) and its subordinates, the very well-known
and extremely powerful Algorithm Convolution Neural Networks (CNN) can provide a new dimension to the
image processing domain. It has way higher accuracy and computational power for classifying images further
and segregating their various entities as individual components of the image working region. Major focus will
be on the Region Convolution Neural Networks (R-CNN) algorithm and how well it provides the pixel-level
segmentation further using its better successors like the Fast-Faster and Mask R-CNN versions.
IRJET-Multiple Object Detection using Deep Neural NetworksIRJET Journal
The document describes a system for detecting multiple objects in videos using deep convolutional neural networks. The system first uses a Region Proposal Network to generate candidate object regions in each frame. It then applies a convolutional neural network to the full frame to extract features, and uses those features to classify and refine the bounding boxes for each proposed region. To improve detection across frames, the system also analyzes results from consecutive frames using a post-processing algorithm. The goal is to enhance confidence for consistently detected objects over time. Evaluation shows the approach effectively detects multiple objects in scenes from video frames.
Secure IoT Systems Monitor Framework using Probabilistic Image EncryptionIJAEMSJORNAL
In recent years, the modeling of human behaviors and patterns of activity for recognition or detection of special events has attracted considerable research interest. Various methods abounding to build intelligent vision systems aimed at understanding the scene and making correct semantic inferences from the observed dynamics of moving targets. Many systems include detection, storage of video information, and human-computer interfaces. Here we present not only an update that expands previous similar surveys but also a emphasis on contextual abnormal detection of human activity , especially in video surveillance applications. The main purpose of this survey is to identify existing methods extensively, and to characterize the literature in a manner that brings to attention key challenges.
Review On Different Feature Extraction AlgorithmsIRJET Journal
This document discusses different feature extraction algorithms that can be used in visual sensor networks (WVSN). It begins by explaining the traditional "compress-then-analyze" approach and introduces the newer "analyze-then-compress" approach. The document then reviews several real-valued and binary feature extraction algorithms such as SIFT, SURF, BRIEF, BRISK, and BinBoost. It proposes a video coding system that uses the BRISK algorithm to extract features, encodes the features using entropy coding, and transmits the data to a base station for matching with stored image data. Diagrams of the proposed system and the BRISK algorithm are also included.
This document discusses human action recognition in videos using two deep learning models: Long-Term Recurrent Convolutional Networks (LRCN) and Convolutional Long Short-Term Memory (ConvLSTM). LRCN combines convolutional and LSTM layers to analyze video frame sequences, while ConvLSTM integrates convolutional operations within LSTM cells. The document aims to compare the performance of these two models on the UCF101 action recognition dataset. It provides background on related work, describes the proposed methodology including the LRCN and ConvLSTM models, and outlines the experimental procedure to evaluate and compare the two approaches on video classification tasks.
IGeekS Technologies is a company located in Bangalore, India. We have being recognized as a quality provider of hardware and software solutions for the student’s in order carry out their academic Projects. We offer academic projects at various academic levels ranging from graduates to masters (Diploma, BCA, BE, M. Tech, MCA, M. Sc (CS/IT)). As a part of the development training, we offer Projects in Embedded Systems & Software to the Engineering College students in all major disciplines.
Key Frame Extraction for Salient Activity RecognitionSuhas Pillai
This document proposes a key frame extraction method for efficient and accurate activity recognition in videos using deep learning. It introduces a multi-stream convolutional neural network architecture that takes in multiple representations of video data as input, including different color spaces (RGB, YCbCr) and optical flow. Key frames are extracted from videos based on optical flow, representing important moments while reducing computational requirements compared to using all frames. The method is evaluated on the UCF-101 dataset and shows promising results for video classification using different combinations of color spaces and optical flow as input to the multi-stream CNN model.
Deep learning-based switchable network for in-loop filtering in high efficie...IJECEIAES
The video codecs are focusing on a smart transition in this era. A future area of research that has not yet been fully investigated is the effect of deep learning on video compression. The paper’s goal is to reduce the ringing and artifacts that loop filtering causes when high-efficiency video compression is used. Even though there is a lot of research being done to lessen this effect, there are still many improvements that can be made. In This paper we have focused on an intelligent solution for improvising in-loop filtering in high efficiency video coding (HEVC) using a deep convolutional neural network (CNN). The paper proposes the design and implementation of deep CNN-based loop filtering using a series of 15 CNN networks followed by a combine and squeeze network that improves feature extraction. The resultant output is free from double enhancement and the peak signal-to-noise ratio is improved by 0.5 dB compared to existing techniques. The experiments then demonstrate that improving the coding efficiency by pipelining this network to the current network and using it for higher quantization parameters (QP) is more effective than using it separately. Coding efficiency is improved by an average of 8.3% with the switching based deep CNN in-loop filtering.
The International Journal of Engineering and Science (The IJES)theijes
This document summarizes and compares different techniques for moving object detection in video surveillance systems. It discusses background subtraction, background estimation, and adaptive contrast change detection methods. It finds that while traditional methods work for single objects, correlation between frames performs better for multiple objects or poor lighting conditions, as it detects changes between frames. The document evaluates several algorithms and concludes correlation significantly improves output and performance even with multiple moving objects, making it suitable for night-time surveillance applications.
A Video Processing based System for Counting VehiclesIRJET Journal
This document describes a video processing system for counting vehicles. The system processes video frames using discrete wavelet transform (DWT) features and a neural network. In the first phase, vehicle images are extracted from videos and used to train a backpropagation neural network to detect vehicles based on DWT features. In the testing phase, video frames are extracted and the DWT features of frames showing the detection point are input to the neural network to detect vehicles. The system was tested on videos and achieved satisfactory counting accuracy ranging from 97.9-100%. The system provides an effective way to count vehicles for applications like traffic analysis.
IRJET- Analysis of Vehicle Number Plate RecognitionIRJET Journal
This document summarizes research on techniques for enhancing license plate images and improving vehicle number plate recognition. It discusses using forward motion deblurring, scale-based region growing, blur estimation, blind image deblurring, and a no-reference metric to enhance license plate image details and obtain deblurred images. The document also reviews related work applying these and other techniques, such as binary SIFT, energy cooperation in wireless networks, and parametric blur estimation for natural image restoration. The goal is to develop more accurate and robust methods for automatic license plate recognition.
IRJET- Semantic Segmentation using Deep LearningIRJET Journal
The document discusses semantic image segmentation using deep learning techniques. It summarizes several state-of-the-art semantic segmentation models like U-Net, Dilated U-Net, PSPNet, Fully Convolutional DenseNets, Global Convolutional Network (GCN), DeepLabV3, and proposes an optimized FRRN model. It implements these models on the CamVid dataset and evaluates their performance using the intersection-over-union score, finding that the optimized FRRN model achieves a score of 0.87.
A Review on Natural Scene Text Understanding for Computer Vision using Machin...IRJET Journal
This document reviews various machine learning techniques for natural scene text understanding in computer vision. It discusses how deep learning methods like convolutional neural networks (CNNs) are more accurate than traditional image processing for text detection and recognition in complex, natural images. The document also summarizes several papers that propose different models using techniques like recurrent neural networks (RNNs), feature pyramid networks (FPNs), attention mechanisms, and connectionist temporal classification to detect and recognize text in scene images with challenges like variations in font, orientation, lighting and background complexity.
Human Action Recognition Using Deep LearningIRJET Journal
This document discusses human action recognition using deep learning models. It proposes using two deep learning models - Convolutional Neural Networks (CNN) and Long-term Recurrent Convolutional Networks (LRCN) - to recognize human actions in videos. The Kinetics dataset is used to train and evaluate the models. Results show that both CNN and LRCN are able to accurately recognize human actions like playing piano or archery in test videos. The LRCN model achieves slightly higher accuracy compared to the traditional two-stream CNN method.
This document provides an overview of a project that implemented image filtering using VHDL on an FPGA board. It discusses designing filters like average, Sobel, Gaussian, and Laplacian filters. Cache memory and a processing unit were developed to hold pixel values and apply filter kernels. Different methods for multiplication in the convolution process were evaluated. Results showed the output images after applying each filter both in software and on the FPGA board. In conclusion, FPGAs provide reconfigurable, accelerated processing for image applications like filtering compared to general purpose computers.
In a tight labour market, job-seekers gain bargaining power and leverage it into greater job quality—at least, that’s the conventional wisdom.
Michael, LMIC Economist, presented findings that reveal a weakened relationship between labour market tightness and job quality indicators following the pandemic. Labour market tightness coincided with growth in real wages for only a portion of workers: those in low-wage jobs requiring little education. Several factors—including labour market composition, worker and employer behaviour, and labour market practices—have contributed to the absence of worker benefits. These will be investigated further in future work.
^%$Zone1:+971)581248768’][* Legit & Safe #Abortion #Pills #For #Sale In #Duba...mayaclinic18
Whatsapp (+971581248768) Buy Abortion Pills In Dubai/ Qatar/Kuwait/Doha/Abu Dhabi/Alain/RAK City/Satwa/Al Ain/Abortion Pills For Sale In Qatar, Doha. Abu az Zuluf. Abu Thaylah. Ad Dawhah al Jadidah. Al Arish, Al Bida ash Sharqiyah, Al Ghanim, Al Ghuwariyah, Qatari, Abu Dhabi, Dubai.. WHATSAPP +971)581248768 Abortion Pills / Cytotec Tablets Available in Dubai, Sharjah, Abudhabi, Ajman, Alain, Fujeira, Ras Al Khaima, Umm Al Quwain., UAE, buy cytotec in Dubai– Where I can buy abortion pills in Dubai,+971582071918where I can buy abortion pills in Abudhabi +971)581248768 , where I can buy abortion pills in Sharjah,+97158207191 8where I can buy abortion pills in Ajman, +971)581248768 where I can buy abortion pills in Umm al Quwain +971)581248768 , where I can buy abortion pills in Fujairah +971)581248768 , where I can buy abortion pills in Ras al Khaimah +971)581248768 , where I can buy abortion pills in Alain+971)581248768 , where I can buy abortion pills in UAE +971)581248768 we are providing cytotec 200mg abortion pill in dubai, uae.Medication abortion offers an alternative to Surgical Abortion for women in the early weeks of pregnancy. Zone1:+971)581248768’][* Legit & Safe #Abortion #Pills #For #Sale In #Dubai Abu Dhabi Sharjah Deira Ajman Fujairah Ras Al Khaimah%^^%$Zone1:+971)581248768’][* Legit & Safe #Abortion #Pills #For #Sale In #Dubai Abu Dhabi Sharjah Deira Ajman Fujairah Ras Al Khaimah%^^%$Zone1:+971)581248768’][* Legit & Safe #Abortion #Pills #For #Sale In #Dubai Abu Dhabi Sharjah Deira Ajman Fujairah Ras Al Khaimah%^^%$Zone1:+971)581248768’][* Legit & Safe #Abortion #Pills #For #Sale In #Dubai Abu Dhabi Sharjah Deira Ajman Fujairah Ras Al Khaimah%^^%$Zone1:+971)581248768’][* Legit & Safe #Abortion #Pills #For #Sale In #Dubai Abu Dhabi Sharjah Deira Ajman Fujairah Ras Al Khaimah%^^%$Zone1:+971)581248768’][* Legit & Safe #Abortion #Pills #For #Sale In #Dubai Abu Dhabi Sharjah Deira Ajman Fujairah Ras Al Khaimah%^^%$Zone1:+971)581248768’][* Legit & Safe #Abortion #Pills #For #Sale In #Dubai Abu Dhabi Sharjah Deira Ajman Fujairah Ras Al Khaimah%^^%$Zone1:+971)581248768’][* Legit & Safe #Abortion #Pills #For #Sale In #Dubai Abu Dhabi Sharjah Deira Ajman Fujairah Ras Al Khaimah%^^%$Zone1:+971)581248768’][* Legit & Safe #Abortion #Pills #For #Sale In #Dubai Abu Dhabi Sharjah Deira Ajman Fujairah Ras Al Khaimah%^^%$Zone1:+971)581248768’][* Legit & Safe #Abortion #Pills #For #Sale In #Dubai Abu Dhabi Sharjah Deira Ajman Fujairah Ras Al Khaimah%^^%$Zone1:+971)581248768’][* Legit & Safe #Abortion #Pills #For #Sale In #Dubai Abu Dhabi Sharjah Deira Ajman Fujairah Ras Al Khaimah%^^%$Zone1:+971)581248768’][* Legit & Safe #Abortion #Pills #For #Sale In #Dubai Abu Dhabi Sharjah Deira Ajman Fujairah Ras Al Khaimah%^^%$Zone1:+971)581248768’][* Legit & Safe #Abortion #Pills #For #Sale In #Dubai Abu Dhabi Sharjah Deira Ajman Fujairah Ras Al Khaimah%^^%$Zone1:+971)581248768’][* Legit & Safe #Abortion #Pills #For #Sale In #Dubai Abu Dhabi Sharjah Deira Ajman
2. Elemental Economics - Mineral demand.pdfNeal Brewster
After this second you should be able to: Explain the main determinants of demand for any mineral product, and their relative importance; recognise and explain how demand for any product is likely to change with economic activity; recognise and explain the roles of technology and relative prices in influencing demand; be able to explain the differences between the rates of growth of demand for different products.
STREETONOMICS: Exploring the Uncharted Territories of Informal Markets throug...sameer shah
Delve into the world of STREETONOMICS, where a team of 7 enthusiasts embarks on a journey to understand unorganized markets. By engaging with a coffee street vendor and crafting questionnaires, this project uncovers valuable insights into consumer behavior and market dynamics in informal settings."
Independent Study - College of Wooster Research (2023-2024) FDI, Culture, Glo...AntoniaOwensDetwiler
"Does Foreign Direct Investment Negatively Affect Preservation of Culture in the Global South? Case Studies in Thailand and Cambodia."
Do elements of globalization, such as Foreign Direct Investment (FDI), negatively affect the ability of countries in the Global South to preserve their culture? This research aims to answer this question by employing a cross-sectional comparative case study analysis utilizing methods of difference. Thailand and Cambodia are compared as they are in the same region and have a similar culture. The metric of difference between Thailand and Cambodia is their ability to preserve their culture. This ability is operationalized by their respective attitudes towards FDI; Thailand imposes stringent regulations and limitations on FDI while Cambodia does not hesitate to accept most FDI and imposes fewer limitations. The evidence from this study suggests that FDI from globally influential countries with high gross domestic products (GDPs) (e.g. China, U.S.) challenges the ability of countries with lower GDPs (e.g. Cambodia) to protect their culture. Furthermore, the ability, or lack thereof, of the receiving countries to protect their culture is amplified by the existence and implementation of restrictive FDI policies imposed by their governments.
My study abroad in Bali, Indonesia, inspired this research topic as I noticed how globalization is changing the culture of its people. I learned their language and way of life which helped me understand the beauty and importance of cultural preservation. I believe we could all benefit from learning new perspectives as they could help us ideate solutions to contemporary issues and empathize with others.
Financial Assets: Debit vs Equity Securities.pptxWrito-Finance
financial assets represent claim for future benefit or cash. Financial assets are formed by establishing contracts between participants. These financial assets are used for collection of huge amounts of money for business purposes.
Two major Types: Debt Securities and Equity Securities.
Debt Securities are Also known as fixed-income securities or instruments. The type of assets is formed by establishing contracts between investor and issuer of the asset.
• The first type of Debit securities is BONDS. Bonds are issued by corporations and government (both local and national government).
• The second important type of Debit security is NOTES. Apart from similarities associated with notes and bonds, notes have shorter term maturity.
• The 3rd important type of Debit security is TRESURY BILLS. These securities have short-term ranging from three months, six months, and one year. Issuer of such securities are governments.
• Above discussed debit securities are mostly issued by governments and corporations. CERTIFICATE OF DEPOSITS CDs are issued by Banks and Financial Institutions. Risk factor associated with CDs gets reduced when issued by reputable institutions or Banks.
Following are the risk attached with debt securities: Credit risk, interest rate risk and currency risk
There are no fixed maturity dates in such securities, and asset’s value is determined by company’s performance. There are two major types of equity securities: common stock and preferred stock.
Common Stock: These are simple equity securities and bear no complexities which the preferred stock bears. Holders of such securities or instrument have the voting rights when it comes to select the company’s board of director or the business decisions to be made.
Preferred Stock: Preferred stocks are sometime referred to as hybrid securities, because it contains elements of both debit security and equity security. Preferred stock confers ownership rights to security holder that is why it is equity instrument
<a href="https://www.writofinance.com/equity-securities-features-types-risk/" >Equity securities </a> as a whole is used for capital funding for companies. Companies have multiple expenses to cover. Potential growth of company is required in competitive market. So, these securities are used for capital generation, and then uses it for company’s growth.
Concluding remarks
Both are employed in business. Businesses are often established through debit securities, then what is the need for equity securities. Companies have to cover multiple expenses and expansion of business. They can also use equity instruments for repayment of debits. So, there are multiple uses for securities. As an investor, you need tools for analysis. Investment decisions are made by carefully analyzing the market. For better analysis of the stock market, investors often employ financial analysis of companies.
1. A COMPREHENSIVE SURVEY ON
VIDEO FRAME INTERPOLATION
TECHNIQUES - 2020
BY ANIL SINGH PARIHAR · DISHA VARSHNEY · KSHITIJA
PANDYA · ASHRAY AGGARWAL
PRESENTED BY :
SHAHABUDDIN AHMED
ROLL NO - 220EE1473
UNDER THE SUPERVISION OF :
DR. DIPTI PATRA
5. OVERVIEW OF THIS SURVEY
A comprehensive evaluation of the advanced deep learning-based methods
developed in the past decade, thereby assisting readers with a detailed overview
of comparative performance and research results of recent state-of-the-art
methods.
Insightful analysis of system requirements, algorithm complexity, quality of
results, and characterization of methods based on different schemes of image
processing and frame rate conversion, emphasizing the pros and cons of these
methods outlined in reviewed works, and
discussion of potential challenges of frame rate conversion domain for
identification of shortcomings of existing techniques and aligning potential
directions for future
7. BENCHMARK DATASETS
Most Frequently used Datasets : -
UCF101
MIDDLEBURY
VIMEO – 90K
Other Relevant Datasets : -
ADOBE240
KITTI
DAVIS
8. UCF101
UCF101 is an action recognition dataset collected from user-uploaded
YouTube videos.
There are 13320 videos divided into 101 categories in the UCF101 dataset.
In total, 101 action categories are segregated into 25 groups, each having
4-7 videos of the action
Videos in the same group generally contain some common features like
object appearances and background.
9. MIDDLEBURY
There are two sub- sets in Middlebury datasets. First, the Other set, which
provides the ground-truth middle frames, second the Evaluation set,
which hides the ground truth and is evaluated by uploading the results to
the benchmark website.
There are four types of data to test different aspects of optical flow
algorithms:
(1) sequences with non-rigid motion where the ground-truth flow is
determined by tracking hidden fluorescent texture.
(2) realistic synthetic sequences.
(3) high-frame-rate video used to study interpolation error.
(4) modified stereo sequences of static scenes.
10. VIMEO – 90K
It contains more than 89,000 videos of 720p or higher resolution, which
are downloaded from the Vimeo video sharing platform.
The motion of objects in the Vimeo-90k dataset is much larger than that
of UCF10.
It includes 51,313 triplets for training. Each triplet is made up of 3
consecutive video frames with a resolution of 448×256 pixels.
Authors have trained their networks to predict the middle frame (i.e., t =
0.5) of each triplet. There are 3,782 triplets in the test set of this dataset,
and hence, it is used widely for comparing per- formances of different
algorithms due to the high-quality videos.
13. Paper /Article Authors and
Year
Research
Flownet : Learning Optical Flow with
Convolutional Networks
Dosovitskiy
et. al.
(2015)
The network is entirely convolutional and can be trained using
any triplet of an image sequence, which can be of different
resolutions. They used the Charbonneir loss function and
trained it on the KITTI dataset.
Learning Image Matching by Simply
Watching Video
Long et. Al.
(2016)
They developed a deep CNN which
can be trained without any supervision They exploited the
temporal coherency that occurred naturally in the real-world
videos and used it to calculate sensitivity maps, i.e., gradient
with respect to input via back propagation.
Visual Dynamics: Probabilistic future
frame synthesis via Cross – Convolutional
Networks
Xue et al.
(2016)
Their network predicts multiple extrapolated frames from a
single frame. The proposed network consists of five
components. First is a variational auto-encoder to encompass
motion information. Second is a kernel decoder which learns
motion kernels from the output of the above motion encoder.
The third is an image encoder which estimated feature maps
from an image. Furthermore, the next is their novel cross-
convolutional layer which convolves the feature maps with
motion kernels. And then finally, a regressor.
Video Frame Interpolation via Adaptive
Convolution
Niklaus et. al.
(2017)
They proposed a fully convolutional deep neural network which
predicts spatially adaptive convolutional kernels for each pixel
from the two given successive input frames. They calculated a
separate kernel for each pixel in the interpolated frame to
estimate the value of the pixel. The predicted spatially adaptive
pixel-wise convolution kernels are then convolved with the
input frames to generate the interpolated frame.
14. Paper /Article Authors and
Year
Research
Video Frame Interpolation via Adaptive
Separable Convolution
Niklaus et. al.
(2017)
They improved on their previous approach by using separable
convolutions. Instead of estimating the whole kernel for each
pixel, they estimated spatially adaptive pairs of 1D convolution
kernels for each pixel, thus reducing the parameters to be
estimated. It optimizes the algorithm within the allowable range.
Video Frame Synthesis using Deep Voxel
Flow
Liu et al.
(2017)
It calculates dense voxel
Flow and uses it to generate an interpolated frame. These voxels
encase the motion changes in the temporal domain, and
intermediate frames can be generated using trilinear
interpolation.
They proposed an end to end full differentiable
network that adopts a fully convolutional encoder–decoder
architecture with a bottleneck layer that calculates voxel.
Deep Video Frame Interpolation using Cyclic
Frame Generation.
Liu et al.
(2019)
It introduces a novel loss called cycle consistency loss, which can
be integrated with any frame interpolation method. They
postulated that given three consecutive frames I1, I2, and I3, the
frames generated by I1-I2 and I2-I3 would generate another frame
that will be bounded by frame I2 in a cyclic manner.
Tracking and Frame Rate Enhancement for
real-time 2D human pose estimation
Vidanpathirana et al.
(2019)
It attempted to reduce optical flow errors by designing a pose
tracking system that operates on a system of queues in a multi-
threaded environment and provides a fast point tracking solution
to boost the frame rate of pose estimation system.
15. Paper /Article Authors and
Year
Research
AdaCoF : Adaptive collaboration of flows
for video frame interpolation
Lee et al.
(2020)
designed the latest warping module called adaptive collaboration
of flows (AdaCof) based on an operation that uses any number of
pixels and any location.
Video Frame Interpolation via deformable
Separable Convolution
Cheng et. al.
(2020)
They proposed to use more relevant pixels to estimate kernels
adaptively calling it deformable separable convolution (DSepConv),
hence using smaller kernel size with relevant features for handling
large motion. DSepConv uses the encoder–decoder network for
feature Extraction.
Multiple Video Frame Interpolation via
Enhanced Deformable Separable
Convolution.
Cheng et. al.
(2020)
They were able to reduce the number of
parameters to be trained while maintaining the same results.
They were also able to generate multiple interpolated frames
between two consecutive frames making it the first kernelbased
approach to do so. However, the results for arbitrary
time interpolation were not as good as state-of-the-art flowbased
approaches.
16. Flow Based Methods
High-quality video frame interpolation often depends on precise motion
estimation techniques that train mathematical or deep learning models to
establish a strong correlation between consecutive frames in order to preserve
the continuity of flow, based on the actual optical displacement of flow vectors
and trajectory of visual components via relevant occlusion reasoning and color
consistency methods. [1]
So far, flow-based methods have managed to achieve comparable results
parallel to the latest GAN [2], CNN, hybrid technologies, and are evolving
evidently to outstand heavy computational requirements of deep learning
methods on real-time benchmarks.
[1] - Yazdi et. Al. ,Real-time object tracking based on an adaptive transition model and extended kalman filter (2019)
[2] – Caballero et. al. , Frame interpolation with multiscale deep loss functions and generative adversarial networks. (2017)
17. Path selective interpolation
Dhruv Mahajan et al. (2009) [40] proposed a path framework parallel to an inverse
optical flow approach that computes background motion of arbitrary intermediate
pixel ‘p’ in the input frames.
Bo Yan et al. (2013) improved the scheme mentioned above by introducing two
leading innovations: first, using standard optical flow to supervise path direction
by constraining path length.
Yizhou Fan et al. (2016) further guided the optimization process of path
construction by collaborating conventional path-based framework of Mahajan and
Bo Yan with useful feature points extracted from input frames. Integrating
semantic information identifies critical pixels in input frames. It supervises the
method for accurate motion pattern recognition via optimal energy minimization ,
thus avoiding wrong path selection and achieving more natural results.
18. GAN-based interpolation
• Inspired by the evident success of deep convolutional neural networks in video
processing tasks and thorough research in the field of GANs by Goodfellow since
2014, the first GAN interpolation network designed by Mark Koren et al.
(FINNiGAN [111]) in 2016 managed to cut traditional “ghosting” and “tearing”
artifacts and compensated for fast-motion discrepancies. The general architecture
of a GAN framework. Few developments have been discussed below.
• They enhanced frame rate by utilizing the convolutional neural network
framework collaborated with generative adversarial networks known as FINNiGAN.
• The idea is to prevent common structural information loss during up-sampling by
using a SIN (structure interpolation network) that produces the structure of the
intermediate frame
19. • ZheHu et al. (2018) [112] proposed amulti-scale structure to share parameters
across various layers and cut costly global optimization, unlike usual flow-based
methods. MSFSN (multi-scale frame synthesis network) became the first model to
provide flexibility by parametrizing the temporal locus of the interest frame.
• Chenguang Li et al. (2018) [113] exploited multi-scale CNN architecture to support
the long-varied motion and employed additional WGAN-GP (Wasserstein
generative adversarial network loss with gradient penalty) [114] to achieve more
natural results. The model has a slim generator network structure requiring
relatively less storage and high-speed processing due to residual structure and
cumulative generation of high-resolution frames.
20. Conclusion
This paper puts forward a comprehensive survey of classic video frame
interpolation techniques using deep learning. They present a broad
overview of existing widely used benchmark evaluations. The five
modalities exhibit their unique features and branch to diverse choices
of deep learning techniques to utilize their properties productively.
The inherent spatial, temporal, and structural attributes of a video
sequence are identified. From the aspect of spatiotemporal structural
encoding, we highlight the pros and cons of available techniques.
Based on key insights underlined by the survey, the problem of video
frame interpolation contains promising research opportunities.
New techniques in deep learnin will enlarge the scope of improvement
of frame interpolation. GAN-based training paradigms show a
promising future in the field of frame interpolation. Combining frame
interpolation with other video processing tasks also seems to interest
researchers.