SlideShare a Scribd company logo
A COMPREHENSIVE SURVEY ON
VIDEO FRAME INTERPOLATION
TECHNIQUES - 2020
BY ANIL SINGH PARIHAR · DISHA VARSHNEY · KSHITIJA
PANDYA · ASHRAY AGGARWAL
PRESENTED BY :
SHAHABUDDIN AHMED
ROLL NO - 220EE1473
UNDER THE SUPERVISION OF :
DR. DIPTI PATRA
CONTENTS
Overview of The Survey
Benchmark Datasets
Video Frame interpolation Methods
Key Aspects:
 High Visual Quality
 Low Complexity
 High Efficiency of Interpolated Output Image
VFI Appications
Video post-processing
Surveillance
Video Restoration Tasks.
OVERVIEW OF THIS SURVEY
 A comprehensive evaluation of the advanced deep learning-based methods
developed in the past decade, thereby assisting readers with a detailed overview
of comparative performance and research results of recent state-of-the-art
methods.
 Insightful analysis of system requirements, algorithm complexity, quality of
results, and characterization of methods based on different schemes of image
processing and frame rate conversion, emphasizing the pros and cons of these
methods outlined in reviewed works, and
 discussion of potential challenges of frame rate conversion domain for
identification of shortcomings of existing techniques and aligning potential
directions for future
VIDEO FRAME INTERPOLATION METHODS
BENCHMARK DATASETS
Most Frequently used Datasets : -
UCF101
MIDDLEBURY
VIMEO – 90K
Other Relevant Datasets : -
ADOBE240
KITTI
DAVIS
UCF101
 UCF101 is an action recognition dataset collected from user-uploaded
YouTube videos.
 There are 13320 videos divided into 101 categories in the UCF101 dataset.
 In total, 101 action categories are segregated into 25 groups, each having
4-7 videos of the action
 Videos in the same group generally contain some common features like
object appearances and background.
MIDDLEBURY
 There are two sub- sets in Middlebury datasets. First, the Other set, which
provides the ground-truth middle frames, second the Evaluation set,
which hides the ground truth and is evaluated by uploading the results to
the benchmark website.
 There are four types of data to test different aspects of optical flow
algorithms:
 (1) sequences with non-rigid motion where the ground-truth flow is
determined by tracking hidden fluorescent texture.
 (2) realistic synthetic sequences.
 (3) high-frame-rate video used to study interpolation error.
 (4) modified stereo sequences of static scenes.
VIMEO – 90K
 It contains more than 89,000 videos of 720p or higher resolution, which
are downloaded from the Vimeo video sharing platform.
 The motion of objects in the Vimeo-90k dataset is much larger than that
of UCF10.
 It includes 51,313 triplets for training. Each triplet is made up of 3
consecutive video frames with a resolution of 448×256 pixels.
 Authors have trained their networks to predict the middle frame (i.e., t =
0.5) of each triplet. There are 3,782 triplets in the test set of this dataset,
and hence, it is used widely for comparing per- formances of different
algorithms due to the high-quality videos.
Other DATASETs
Xiph
DAVIS
KITTI
CNN and kernel-based Methods
Paper /Article Authors and
Year
Research
Flownet : Learning Optical Flow with
Convolutional Networks
Dosovitskiy
et. al.
(2015)
The network is entirely convolutional and can be trained using
any triplet of an image sequence, which can be of different
resolutions. They used the Charbonneir loss function and
trained it on the KITTI dataset.
Learning Image Matching by Simply
Watching Video
Long et. Al.
(2016)
They developed a deep CNN which
can be trained without any supervision They exploited the
temporal coherency that occurred naturally in the real-world
videos and used it to calculate sensitivity maps, i.e., gradient
with respect to input via back propagation.
Visual Dynamics: Probabilistic future
frame synthesis via Cross – Convolutional
Networks
Xue et al.
(2016)
Their network predicts multiple extrapolated frames from a
single frame. The proposed network consists of five
components. First is a variational auto-encoder to encompass
motion information. Second is a kernel decoder which learns
motion kernels from the output of the above motion encoder.
The third is an image encoder which estimated feature maps
from an image. Furthermore, the next is their novel cross-
convolutional layer which convolves the feature maps with
motion kernels. And then finally, a regressor.
Video Frame Interpolation via Adaptive
Convolution
Niklaus et. al.
(2017)
They proposed a fully convolutional deep neural network which
predicts spatially adaptive convolutional kernels for each pixel
from the two given successive input frames. They calculated a
separate kernel for each pixel in the interpolated frame to
estimate the value of the pixel. The predicted spatially adaptive
pixel-wise convolution kernels are then convolved with the
input frames to generate the interpolated frame.
Paper /Article Authors and
Year
Research
Video Frame Interpolation via Adaptive
Separable Convolution
Niklaus et. al.
(2017)
They improved on their previous approach by using separable
convolutions. Instead of estimating the whole kernel for each
pixel, they estimated spatially adaptive pairs of 1D convolution
kernels for each pixel, thus reducing the parameters to be
estimated. It optimizes the algorithm within the allowable range.
Video Frame Synthesis using Deep Voxel
Flow
Liu et al.
(2017)
It calculates dense voxel
Flow and uses it to generate an interpolated frame. These voxels
encase the motion changes in the temporal domain, and
intermediate frames can be generated using trilinear
interpolation.
They proposed an end to end full differentiable
network that adopts a fully convolutional encoder–decoder
architecture with a bottleneck layer that calculates voxel.
Deep Video Frame Interpolation using Cyclic
Frame Generation.
Liu et al.
(2019)
It introduces a novel loss called cycle consistency loss, which can
be integrated with any frame interpolation method. They
postulated that given three consecutive frames I1, I2, and I3, the
frames generated by I1-I2 and I2-I3 would generate another frame
that will be bounded by frame I2 in a cyclic manner.
Tracking and Frame Rate Enhancement for
real-time 2D human pose estimation
Vidanpathirana et al.
(2019)
It attempted to reduce optical flow errors by designing a pose
tracking system that operates on a system of queues in a multi-
threaded environment and provides a fast point tracking solution
to boost the frame rate of pose estimation system.
Paper /Article Authors and
Year
Research
AdaCoF : Adaptive collaboration of flows
for video frame interpolation
Lee et al.
(2020)
designed the latest warping module called adaptive collaboration
of flows (AdaCof) based on an operation that uses any number of
pixels and any location.
Video Frame Interpolation via deformable
Separable Convolution
Cheng et. al.
(2020)
They proposed to use more relevant pixels to estimate kernels
adaptively calling it deformable separable convolution (DSepConv),
hence using smaller kernel size with relevant features for handling
large motion. DSepConv uses the encoder–decoder network for
feature Extraction.
Multiple Video Frame Interpolation via
Enhanced Deformable Separable
Convolution.
Cheng et. al.
(2020)
They were able to reduce the number of
parameters to be trained while maintaining the same results.
They were also able to generate multiple interpolated frames
between two consecutive frames making it the first kernelbased
approach to do so. However, the results for arbitrary
time interpolation were not as good as state-of-the-art flowbased
approaches.
Flow Based Methods
 High-quality video frame interpolation often depends on precise motion
estimation techniques that train mathematical or deep learning models to
establish a strong correlation between consecutive frames in order to preserve
the continuity of flow, based on the actual optical displacement of flow vectors
and trajectory of visual components via relevant occlusion reasoning and color
consistency methods. [1]
 So far, flow-based methods have managed to achieve comparable results
parallel to the latest GAN [2], CNN, hybrid technologies, and are evolving
evidently to outstand heavy computational requirements of deep learning
methods on real-time benchmarks.
[1] - Yazdi et. Al. ,Real-time object tracking based on an adaptive transition model and extended kalman filter (2019)
[2] – Caballero et. al. , Frame interpolation with multiscale deep loss functions and generative adversarial networks. (2017)
Path selective interpolation
 Dhruv Mahajan et al. (2009) [40] proposed a path framework parallel to an inverse
optical flow approach that computes background motion of arbitrary intermediate
pixel ‘p’ in the input frames.
 Bo Yan et al. (2013) improved the scheme mentioned above by introducing two
leading innovations: first, using standard optical flow to supervise path direction
by constraining path length.
 Yizhou Fan et al. (2016) further guided the optimization process of path
construction by collaborating conventional path-based framework of Mahajan and
Bo Yan with useful feature points extracted from input frames. Integrating
semantic information identifies critical pixels in input frames. It supervises the
method for accurate motion pattern recognition via optimal energy minimization ,
thus avoiding wrong path selection and achieving more natural results.
GAN-based interpolation
• Inspired by the evident success of deep convolutional neural networks in video
processing tasks and thorough research in the field of GANs by Goodfellow since
2014, the first GAN interpolation network designed by Mark Koren et al.
(FINNiGAN [111]) in 2016 managed to cut traditional “ghosting” and “tearing”
artifacts and compensated for fast-motion discrepancies. The general architecture
of a GAN framework. Few developments have been discussed below.
• They enhanced frame rate by utilizing the convolutional neural network
framework collaborated with generative adversarial networks known as FINNiGAN.
• The idea is to prevent common structural information loss during up-sampling by
using a SIN (structure interpolation network) that produces the structure of the
intermediate frame
• ZheHu et al. (2018) [112] proposed amulti-scale structure to share parameters
across various layers and cut costly global optimization, unlike usual flow-based
methods. MSFSN (multi-scale frame synthesis network) became the first model to
provide flexibility by parametrizing the temporal locus of the interest frame.
• Chenguang Li et al. (2018) [113] exploited multi-scale CNN architecture to support
the long-varied motion and employed additional WGAN-GP (Wasserstein
generative adversarial network loss with gradient penalty) [114] to achieve more
natural results. The model has a slim generator network structure requiring
relatively less storage and high-speed processing due to residual structure and
cumulative generation of high-resolution frames.
Conclusion
This paper puts forward a comprehensive survey of classic video frame
interpolation techniques using deep learning. They present a broad
overview of existing widely used benchmark evaluations. The five
modalities exhibit their unique features and branch to diverse choices
of deep learning techniques to utilize their properties productively.
The inherent spatial, temporal, and structural attributes of a video
sequence are identified. From the aspect of spatiotemporal structural
encoding, we highlight the pros and cons of available techniques.
Based on key insights underlined by the survey, the problem of video
frame interpolation contains promising research opportunities.
New techniques in deep learnin will enlarge the scope of improvement
of frame interpolation. GAN-based training paradigms show a
promising future in the field of frame interpolation. Combining frame
interpolation with other video processing tasks also seems to interest
researchers.
THANK YOU

More Related Content

Similar to 06-08 ppt.pptx

Parking Surveillance Footage Summarization
Parking Surveillance Footage SummarizationParking Surveillance Footage Summarization
Parking Surveillance Footage Summarization
IRJET Journal
 
kanimozhi2019.pdf
kanimozhi2019.pdfkanimozhi2019.pdf
kanimozhi2019.pdf
AshrafDabbas1
 
Pipeline anomaly detection
Pipeline anomaly detectionPipeline anomaly detection
Pipeline anomaly detection
GauravBiswas9
 
Residual balanced attention network for real-time traffic scene semantic segm...
Residual balanced attention network for real-time traffic scene semantic segm...Residual balanced attention network for real-time traffic scene semantic segm...
Residual balanced attention network for real-time traffic scene semantic segm...
IJECEIAES
 
An Analysis of Various Deep Learning Algorithms for Image Processing
An Analysis of Various Deep Learning Algorithms for Image ProcessingAn Analysis of Various Deep Learning Algorithms for Image Processing
An Analysis of Various Deep Learning Algorithms for Image Processing
vivatechijri
 
IRJET-Multiple Object Detection using Deep Neural Networks
IRJET-Multiple Object Detection using Deep Neural NetworksIRJET-Multiple Object Detection using Deep Neural Networks
IRJET-Multiple Object Detection using Deep Neural Networks
IRJET Journal
 
Secure IoT Systems Monitor Framework using Probabilistic Image Encryption
Secure IoT Systems Monitor Framework using Probabilistic Image EncryptionSecure IoT Systems Monitor Framework using Probabilistic Image Encryption
Secure IoT Systems Monitor Framework using Probabilistic Image Encryption
IJAEMSJORNAL
 
Review On Different Feature Extraction Algorithms
Review On Different Feature Extraction AlgorithmsReview On Different Feature Extraction Algorithms
Review On Different Feature Extraction Algorithms
IRJET Journal
 
Human Action Recognition in Videos
Human Action Recognition in VideosHuman Action Recognition in Videos
Human Action Recognition in Videos
IRJET Journal
 
Matlab 2013 14 papers astract
Matlab 2013 14 papers astractMatlab 2013 14 papers astract
Matlab 2013 14 papers astract
IGEEKS TECHNOLOGIES
 
Key Frame Extraction for Salient Activity Recognition
Key Frame Extraction for Salient Activity RecognitionKey Frame Extraction for Salient Activity Recognition
Key Frame Extraction for Salient Activity Recognition
Suhas Pillai
 
Deep learning-based switchable network for in-loop filtering in high efficie...
Deep learning-based switchable network for in-loop filtering in  high efficie...Deep learning-based switchable network for in-loop filtering in  high efficie...
Deep learning-based switchable network for in-loop filtering in high efficie...
IJECEIAES
 
The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)
theijes
 
Dataset creation for Deep Learning-based Geometric Computer Vision problems
Dataset creation for Deep Learning-based Geometric Computer Vision problemsDataset creation for Deep Learning-based Geometric Computer Vision problems
Dataset creation for Deep Learning-based Geometric Computer Vision problems
PetteriTeikariPhD
 
A Video Processing based System for Counting Vehicles
A Video Processing based System for Counting VehiclesA Video Processing based System for Counting Vehicles
A Video Processing based System for Counting Vehicles
IRJET Journal
 
IRJET- Analysis of Vehicle Number Plate Recognition
IRJET- Analysis of Vehicle Number Plate RecognitionIRJET- Analysis of Vehicle Number Plate Recognition
IRJET- Analysis of Vehicle Number Plate Recognition
IRJET Journal
 
IRJET- Semantic Segmentation using Deep Learning
IRJET- Semantic Segmentation using Deep LearningIRJET- Semantic Segmentation using Deep Learning
IRJET- Semantic Segmentation using Deep Learning
IRJET Journal
 
A Review on Natural Scene Text Understanding for Computer Vision using Machin...
A Review on Natural Scene Text Understanding for Computer Vision using Machin...A Review on Natural Scene Text Understanding for Computer Vision using Machin...
A Review on Natural Scene Text Understanding for Computer Vision using Machin...
IRJET Journal
 
Human Action Recognition Using Deep Learning
Human Action Recognition Using Deep LearningHuman Action Recognition Using Deep Learning
Human Action Recognition Using Deep Learning
IRJET Journal
 
imagefiltervhdl.pptx
imagefiltervhdl.pptximagefiltervhdl.pptx
imagefiltervhdl.pptx
Akbarali206563
 

Similar to 06-08 ppt.pptx (20)

Parking Surveillance Footage Summarization
Parking Surveillance Footage SummarizationParking Surveillance Footage Summarization
Parking Surveillance Footage Summarization
 
kanimozhi2019.pdf
kanimozhi2019.pdfkanimozhi2019.pdf
kanimozhi2019.pdf
 
Pipeline anomaly detection
Pipeline anomaly detectionPipeline anomaly detection
Pipeline anomaly detection
 
Residual balanced attention network for real-time traffic scene semantic segm...
Residual balanced attention network for real-time traffic scene semantic segm...Residual balanced attention network for real-time traffic scene semantic segm...
Residual balanced attention network for real-time traffic scene semantic segm...
 
An Analysis of Various Deep Learning Algorithms for Image Processing
An Analysis of Various Deep Learning Algorithms for Image ProcessingAn Analysis of Various Deep Learning Algorithms for Image Processing
An Analysis of Various Deep Learning Algorithms for Image Processing
 
IRJET-Multiple Object Detection using Deep Neural Networks
IRJET-Multiple Object Detection using Deep Neural NetworksIRJET-Multiple Object Detection using Deep Neural Networks
IRJET-Multiple Object Detection using Deep Neural Networks
 
Secure IoT Systems Monitor Framework using Probabilistic Image Encryption
Secure IoT Systems Monitor Framework using Probabilistic Image EncryptionSecure IoT Systems Monitor Framework using Probabilistic Image Encryption
Secure IoT Systems Monitor Framework using Probabilistic Image Encryption
 
Review On Different Feature Extraction Algorithms
Review On Different Feature Extraction AlgorithmsReview On Different Feature Extraction Algorithms
Review On Different Feature Extraction Algorithms
 
Human Action Recognition in Videos
Human Action Recognition in VideosHuman Action Recognition in Videos
Human Action Recognition in Videos
 
Matlab 2013 14 papers astract
Matlab 2013 14 papers astractMatlab 2013 14 papers astract
Matlab 2013 14 papers astract
 
Key Frame Extraction for Salient Activity Recognition
Key Frame Extraction for Salient Activity RecognitionKey Frame Extraction for Salient Activity Recognition
Key Frame Extraction for Salient Activity Recognition
 
Deep learning-based switchable network for in-loop filtering in high efficie...
Deep learning-based switchable network for in-loop filtering in  high efficie...Deep learning-based switchable network for in-loop filtering in  high efficie...
Deep learning-based switchable network for in-loop filtering in high efficie...
 
The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)
 
Dataset creation for Deep Learning-based Geometric Computer Vision problems
Dataset creation for Deep Learning-based Geometric Computer Vision problemsDataset creation for Deep Learning-based Geometric Computer Vision problems
Dataset creation for Deep Learning-based Geometric Computer Vision problems
 
A Video Processing based System for Counting Vehicles
A Video Processing based System for Counting VehiclesA Video Processing based System for Counting Vehicles
A Video Processing based System for Counting Vehicles
 
IRJET- Analysis of Vehicle Number Plate Recognition
IRJET- Analysis of Vehicle Number Plate RecognitionIRJET- Analysis of Vehicle Number Plate Recognition
IRJET- Analysis of Vehicle Number Plate Recognition
 
IRJET- Semantic Segmentation using Deep Learning
IRJET- Semantic Segmentation using Deep LearningIRJET- Semantic Segmentation using Deep Learning
IRJET- Semantic Segmentation using Deep Learning
 
A Review on Natural Scene Text Understanding for Computer Vision using Machin...
A Review on Natural Scene Text Understanding for Computer Vision using Machin...A Review on Natural Scene Text Understanding for Computer Vision using Machin...
A Review on Natural Scene Text Understanding for Computer Vision using Machin...
 
Human Action Recognition Using Deep Learning
Human Action Recognition Using Deep LearningHuman Action Recognition Using Deep Learning
Human Action Recognition Using Deep Learning
 
imagefiltervhdl.pptx
imagefiltervhdl.pptximagefiltervhdl.pptx
imagefiltervhdl.pptx
 

Recently uploaded

Introduction to Value Added Tax System.ppt
Introduction to Value Added Tax System.pptIntroduction to Value Added Tax System.ppt
Introduction to Value Added Tax System.ppt
VishnuVenugopal84
 
Eco-Innovations and Firm Heterogeneity. Evidence from Italian Family and Nonf...
Eco-Innovations and Firm Heterogeneity.Evidence from Italian Family and Nonf...Eco-Innovations and Firm Heterogeneity.Evidence from Italian Family and Nonf...
Eco-Innovations and Firm Heterogeneity. Evidence from Italian Family and Nonf...
University of Calabria
 
What's a worker’s market? Job quality and labour market tightness
What's a worker’s market? Job quality and labour market tightnessWhat's a worker’s market? Job quality and labour market tightness
What's a worker’s market? Job quality and labour market tightness
Labour Market Information Council | Conseil de l’information sur le marché du travail
 
Instant Issue Debit Cards - School Designs
Instant Issue Debit Cards - School DesignsInstant Issue Debit Cards - School Designs
Instant Issue Debit Cards - School Designs
egoetzinger
 
1:1制作加拿大麦吉尔大学毕业证硕士学历证书原版一模一样
1:1制作加拿大麦吉尔大学毕业证硕士学历证书原版一模一样1:1制作加拿大麦吉尔大学毕业证硕士学历证书原版一模一样
1:1制作加拿大麦吉尔大学毕业证硕士学历证书原版一模一样
qntjwn68
 
How Non-Banking Financial Companies Empower Startups With Venture Debt Financing
How Non-Banking Financial Companies Empower Startups With Venture Debt FinancingHow Non-Banking Financial Companies Empower Startups With Venture Debt Financing
How Non-Banking Financial Companies Empower Startups With Venture Debt Financing
Vighnesh Shashtri
 
一比一原版(UCSB毕业证)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB毕业证)圣芭芭拉分校毕业证如何办理一比一原版(UCSB毕业证)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB毕业证)圣芭芭拉分校毕业证如何办理
bbeucd
 
^%$Zone1:+971)581248768’][* Legit & Safe #Abortion #Pills #For #Sale In #Duba...
^%$Zone1:+971)581248768’][* Legit & Safe #Abortion #Pills #For #Sale In #Duba...^%$Zone1:+971)581248768’][* Legit & Safe #Abortion #Pills #For #Sale In #Duba...
^%$Zone1:+971)581248768’][* Legit & Safe #Abortion #Pills #For #Sale In #Duba...
mayaclinic18
 
2. Elemental Economics - Mineral demand.pdf
2. Elemental Economics - Mineral demand.pdf2. Elemental Economics - Mineral demand.pdf
2. Elemental Economics - Mineral demand.pdf
Neal Brewster
 
STREETONOMICS: Exploring the Uncharted Territories of Informal Markets throug...
STREETONOMICS: Exploring the Uncharted Territories of Informal Markets throug...STREETONOMICS: Exploring the Uncharted Territories of Informal Markets throug...
STREETONOMICS: Exploring the Uncharted Territories of Informal Markets throug...
sameer shah
 
一比一原版(IC毕业证)帝国理工大学毕业证如何办理
一比一原版(IC毕业证)帝国理工大学毕业证如何办理一比一原版(IC毕业证)帝国理工大学毕业证如何办理
一比一原版(IC毕业证)帝国理工大学毕业证如何办理
conose1
 
Independent Study - College of Wooster Research (2023-2024) FDI, Culture, Glo...
Independent Study - College of Wooster Research (2023-2024) FDI, Culture, Glo...Independent Study - College of Wooster Research (2023-2024) FDI, Culture, Glo...
Independent Study - College of Wooster Research (2023-2024) FDI, Culture, Glo...
AntoniaOwensDetwiler
 
OAT_RI_Ep20 WeighingTheRisks_May24_Trade Wars.pptx
OAT_RI_Ep20 WeighingTheRisks_May24_Trade Wars.pptxOAT_RI_Ep20 WeighingTheRisks_May24_Trade Wars.pptx
OAT_RI_Ep20 WeighingTheRisks_May24_Trade Wars.pptx
hiddenlevers
 
Instant Issue Debit Cards - High School Spirit
Instant Issue Debit Cards - High School SpiritInstant Issue Debit Cards - High School Spirit
Instant Issue Debit Cards - High School Spirit
egoetzinger
 
Financial Assets: Debit vs Equity Securities.pptx
Financial Assets: Debit vs Equity Securities.pptxFinancial Assets: Debit vs Equity Securities.pptx
Financial Assets: Debit vs Equity Securities.pptx
Writo-Finance
 
G20 summit held in India. Proper presentation for G20 summit
G20 summit held in India. Proper presentation for G20 summitG20 summit held in India. Proper presentation for G20 summit
G20 summit held in India. Proper presentation for G20 summit
rohitsaxena882511
 
An Overview of the Prosocial dHEDGE Vault works
An Overview of the Prosocial dHEDGE Vault worksAn Overview of the Prosocial dHEDGE Vault works
An Overview of the Prosocial dHEDGE Vault works
Colin R. Turner
 
Role of Information Technology in Revenue - Prof Oyedokun.pptx
Role of Information Technology in Revenue  - Prof Oyedokun.pptxRole of Information Technology in Revenue  - Prof Oyedokun.pptx
Role of Information Technology in Revenue - Prof Oyedokun.pptx
Godwin Emmanuel Oyedokun MBA MSc PhD FCA FCTI FCNA CFE FFAR
 
Who Is the Largest Producer of Soybean in India Now.pdf
Who Is the Largest Producer of Soybean in India Now.pdfWho Is the Largest Producer of Soybean in India Now.pdf
Who Is the Largest Producer of Soybean in India Now.pdf
Price Vision
 
在线办理(GU毕业证书)美国贡萨加大学毕业证学历证书一模一样
在线办理(GU毕业证书)美国贡萨加大学毕业证学历证书一模一样在线办理(GU毕业证书)美国贡萨加大学毕业证学历证书一模一样
在线办理(GU毕业证书)美国贡萨加大学毕业证学历证书一模一样
5spllj1l
 

Recently uploaded (20)

Introduction to Value Added Tax System.ppt
Introduction to Value Added Tax System.pptIntroduction to Value Added Tax System.ppt
Introduction to Value Added Tax System.ppt
 
Eco-Innovations and Firm Heterogeneity. Evidence from Italian Family and Nonf...
Eco-Innovations and Firm Heterogeneity.Evidence from Italian Family and Nonf...Eco-Innovations and Firm Heterogeneity.Evidence from Italian Family and Nonf...
Eco-Innovations and Firm Heterogeneity. Evidence from Italian Family and Nonf...
 
What's a worker’s market? Job quality and labour market tightness
What's a worker’s market? Job quality and labour market tightnessWhat's a worker’s market? Job quality and labour market tightness
What's a worker’s market? Job quality and labour market tightness
 
Instant Issue Debit Cards - School Designs
Instant Issue Debit Cards - School DesignsInstant Issue Debit Cards - School Designs
Instant Issue Debit Cards - School Designs
 
1:1制作加拿大麦吉尔大学毕业证硕士学历证书原版一模一样
1:1制作加拿大麦吉尔大学毕业证硕士学历证书原版一模一样1:1制作加拿大麦吉尔大学毕业证硕士学历证书原版一模一样
1:1制作加拿大麦吉尔大学毕业证硕士学历证书原版一模一样
 
How Non-Banking Financial Companies Empower Startups With Venture Debt Financing
How Non-Banking Financial Companies Empower Startups With Venture Debt FinancingHow Non-Banking Financial Companies Empower Startups With Venture Debt Financing
How Non-Banking Financial Companies Empower Startups With Venture Debt Financing
 
一比一原版(UCSB毕业证)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB毕业证)圣芭芭拉分校毕业证如何办理一比一原版(UCSB毕业证)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB毕业证)圣芭芭拉分校毕业证如何办理
 
^%$Zone1:+971)581248768’][* Legit & Safe #Abortion #Pills #For #Sale In #Duba...
^%$Zone1:+971)581248768’][* Legit & Safe #Abortion #Pills #For #Sale In #Duba...^%$Zone1:+971)581248768’][* Legit & Safe #Abortion #Pills #For #Sale In #Duba...
^%$Zone1:+971)581248768’][* Legit & Safe #Abortion #Pills #For #Sale In #Duba...
 
2. Elemental Economics - Mineral demand.pdf
2. Elemental Economics - Mineral demand.pdf2. Elemental Economics - Mineral demand.pdf
2. Elemental Economics - Mineral demand.pdf
 
STREETONOMICS: Exploring the Uncharted Territories of Informal Markets throug...
STREETONOMICS: Exploring the Uncharted Territories of Informal Markets throug...STREETONOMICS: Exploring the Uncharted Territories of Informal Markets throug...
STREETONOMICS: Exploring the Uncharted Territories of Informal Markets throug...
 
一比一原版(IC毕业证)帝国理工大学毕业证如何办理
一比一原版(IC毕业证)帝国理工大学毕业证如何办理一比一原版(IC毕业证)帝国理工大学毕业证如何办理
一比一原版(IC毕业证)帝国理工大学毕业证如何办理
 
Independent Study - College of Wooster Research (2023-2024) FDI, Culture, Glo...
Independent Study - College of Wooster Research (2023-2024) FDI, Culture, Glo...Independent Study - College of Wooster Research (2023-2024) FDI, Culture, Glo...
Independent Study - College of Wooster Research (2023-2024) FDI, Culture, Glo...
 
OAT_RI_Ep20 WeighingTheRisks_May24_Trade Wars.pptx
OAT_RI_Ep20 WeighingTheRisks_May24_Trade Wars.pptxOAT_RI_Ep20 WeighingTheRisks_May24_Trade Wars.pptx
OAT_RI_Ep20 WeighingTheRisks_May24_Trade Wars.pptx
 
Instant Issue Debit Cards - High School Spirit
Instant Issue Debit Cards - High School SpiritInstant Issue Debit Cards - High School Spirit
Instant Issue Debit Cards - High School Spirit
 
Financial Assets: Debit vs Equity Securities.pptx
Financial Assets: Debit vs Equity Securities.pptxFinancial Assets: Debit vs Equity Securities.pptx
Financial Assets: Debit vs Equity Securities.pptx
 
G20 summit held in India. Proper presentation for G20 summit
G20 summit held in India. Proper presentation for G20 summitG20 summit held in India. Proper presentation for G20 summit
G20 summit held in India. Proper presentation for G20 summit
 
An Overview of the Prosocial dHEDGE Vault works
An Overview of the Prosocial dHEDGE Vault worksAn Overview of the Prosocial dHEDGE Vault works
An Overview of the Prosocial dHEDGE Vault works
 
Role of Information Technology in Revenue - Prof Oyedokun.pptx
Role of Information Technology in Revenue  - Prof Oyedokun.pptxRole of Information Technology in Revenue  - Prof Oyedokun.pptx
Role of Information Technology in Revenue - Prof Oyedokun.pptx
 
Who Is the Largest Producer of Soybean in India Now.pdf
Who Is the Largest Producer of Soybean in India Now.pdfWho Is the Largest Producer of Soybean in India Now.pdf
Who Is the Largest Producer of Soybean in India Now.pdf
 
在线办理(GU毕业证书)美国贡萨加大学毕业证学历证书一模一样
在线办理(GU毕业证书)美国贡萨加大学毕业证学历证书一模一样在线办理(GU毕业证书)美国贡萨加大学毕业证学历证书一模一样
在线办理(GU毕业证书)美国贡萨加大学毕业证学历证书一模一样
 

06-08 ppt.pptx

  • 1. A COMPREHENSIVE SURVEY ON VIDEO FRAME INTERPOLATION TECHNIQUES - 2020 BY ANIL SINGH PARIHAR · DISHA VARSHNEY · KSHITIJA PANDYA · ASHRAY AGGARWAL PRESENTED BY : SHAHABUDDIN AHMED ROLL NO - 220EE1473 UNDER THE SUPERVISION OF : DR. DIPTI PATRA
  • 2. CONTENTS Overview of The Survey Benchmark Datasets Video Frame interpolation Methods
  • 3. Key Aspects:  High Visual Quality  Low Complexity  High Efficiency of Interpolated Output Image
  • 5. OVERVIEW OF THIS SURVEY  A comprehensive evaluation of the advanced deep learning-based methods developed in the past decade, thereby assisting readers with a detailed overview of comparative performance and research results of recent state-of-the-art methods.  Insightful analysis of system requirements, algorithm complexity, quality of results, and characterization of methods based on different schemes of image processing and frame rate conversion, emphasizing the pros and cons of these methods outlined in reviewed works, and  discussion of potential challenges of frame rate conversion domain for identification of shortcomings of existing techniques and aligning potential directions for future
  • 7. BENCHMARK DATASETS Most Frequently used Datasets : - UCF101 MIDDLEBURY VIMEO – 90K Other Relevant Datasets : - ADOBE240 KITTI DAVIS
  • 8. UCF101  UCF101 is an action recognition dataset collected from user-uploaded YouTube videos.  There are 13320 videos divided into 101 categories in the UCF101 dataset.  In total, 101 action categories are segregated into 25 groups, each having 4-7 videos of the action  Videos in the same group generally contain some common features like object appearances and background.
  • 9. MIDDLEBURY  There are two sub- sets in Middlebury datasets. First, the Other set, which provides the ground-truth middle frames, second the Evaluation set, which hides the ground truth and is evaluated by uploading the results to the benchmark website.  There are four types of data to test different aspects of optical flow algorithms:  (1) sequences with non-rigid motion where the ground-truth flow is determined by tracking hidden fluorescent texture.  (2) realistic synthetic sequences.  (3) high-frame-rate video used to study interpolation error.  (4) modified stereo sequences of static scenes.
  • 10. VIMEO – 90K  It contains more than 89,000 videos of 720p or higher resolution, which are downloaded from the Vimeo video sharing platform.  The motion of objects in the Vimeo-90k dataset is much larger than that of UCF10.  It includes 51,313 triplets for training. Each triplet is made up of 3 consecutive video frames with a resolution of 448×256 pixels.  Authors have trained their networks to predict the middle frame (i.e., t = 0.5) of each triplet. There are 3,782 triplets in the test set of this dataset, and hence, it is used widely for comparing per- formances of different algorithms due to the high-quality videos.
  • 13. Paper /Article Authors and Year Research Flownet : Learning Optical Flow with Convolutional Networks Dosovitskiy et. al. (2015) The network is entirely convolutional and can be trained using any triplet of an image sequence, which can be of different resolutions. They used the Charbonneir loss function and trained it on the KITTI dataset. Learning Image Matching by Simply Watching Video Long et. Al. (2016) They developed a deep CNN which can be trained without any supervision They exploited the temporal coherency that occurred naturally in the real-world videos and used it to calculate sensitivity maps, i.e., gradient with respect to input via back propagation. Visual Dynamics: Probabilistic future frame synthesis via Cross – Convolutional Networks Xue et al. (2016) Their network predicts multiple extrapolated frames from a single frame. The proposed network consists of five components. First is a variational auto-encoder to encompass motion information. Second is a kernel decoder which learns motion kernels from the output of the above motion encoder. The third is an image encoder which estimated feature maps from an image. Furthermore, the next is their novel cross- convolutional layer which convolves the feature maps with motion kernels. And then finally, a regressor. Video Frame Interpolation via Adaptive Convolution Niklaus et. al. (2017) They proposed a fully convolutional deep neural network which predicts spatially adaptive convolutional kernels for each pixel from the two given successive input frames. They calculated a separate kernel for each pixel in the interpolated frame to estimate the value of the pixel. The predicted spatially adaptive pixel-wise convolution kernels are then convolved with the input frames to generate the interpolated frame.
  • 14. Paper /Article Authors and Year Research Video Frame Interpolation via Adaptive Separable Convolution Niklaus et. al. (2017) They improved on their previous approach by using separable convolutions. Instead of estimating the whole kernel for each pixel, they estimated spatially adaptive pairs of 1D convolution kernels for each pixel, thus reducing the parameters to be estimated. It optimizes the algorithm within the allowable range. Video Frame Synthesis using Deep Voxel Flow Liu et al. (2017) It calculates dense voxel Flow and uses it to generate an interpolated frame. These voxels encase the motion changes in the temporal domain, and intermediate frames can be generated using trilinear interpolation. They proposed an end to end full differentiable network that adopts a fully convolutional encoder–decoder architecture with a bottleneck layer that calculates voxel. Deep Video Frame Interpolation using Cyclic Frame Generation. Liu et al. (2019) It introduces a novel loss called cycle consistency loss, which can be integrated with any frame interpolation method. They postulated that given three consecutive frames I1, I2, and I3, the frames generated by I1-I2 and I2-I3 would generate another frame that will be bounded by frame I2 in a cyclic manner. Tracking and Frame Rate Enhancement for real-time 2D human pose estimation Vidanpathirana et al. (2019) It attempted to reduce optical flow errors by designing a pose tracking system that operates on a system of queues in a multi- threaded environment and provides a fast point tracking solution to boost the frame rate of pose estimation system.
  • 15. Paper /Article Authors and Year Research AdaCoF : Adaptive collaboration of flows for video frame interpolation Lee et al. (2020) designed the latest warping module called adaptive collaboration of flows (AdaCof) based on an operation that uses any number of pixels and any location. Video Frame Interpolation via deformable Separable Convolution Cheng et. al. (2020) They proposed to use more relevant pixels to estimate kernels adaptively calling it deformable separable convolution (DSepConv), hence using smaller kernel size with relevant features for handling large motion. DSepConv uses the encoder–decoder network for feature Extraction. Multiple Video Frame Interpolation via Enhanced Deformable Separable Convolution. Cheng et. al. (2020) They were able to reduce the number of parameters to be trained while maintaining the same results. They were also able to generate multiple interpolated frames between two consecutive frames making it the first kernelbased approach to do so. However, the results for arbitrary time interpolation were not as good as state-of-the-art flowbased approaches.
  • 16. Flow Based Methods  High-quality video frame interpolation often depends on precise motion estimation techniques that train mathematical or deep learning models to establish a strong correlation between consecutive frames in order to preserve the continuity of flow, based on the actual optical displacement of flow vectors and trajectory of visual components via relevant occlusion reasoning and color consistency methods. [1]  So far, flow-based methods have managed to achieve comparable results parallel to the latest GAN [2], CNN, hybrid technologies, and are evolving evidently to outstand heavy computational requirements of deep learning methods on real-time benchmarks. [1] - Yazdi et. Al. ,Real-time object tracking based on an adaptive transition model and extended kalman filter (2019) [2] – Caballero et. al. , Frame interpolation with multiscale deep loss functions and generative adversarial networks. (2017)
  • 17. Path selective interpolation  Dhruv Mahajan et al. (2009) [40] proposed a path framework parallel to an inverse optical flow approach that computes background motion of arbitrary intermediate pixel ‘p’ in the input frames.  Bo Yan et al. (2013) improved the scheme mentioned above by introducing two leading innovations: first, using standard optical flow to supervise path direction by constraining path length.  Yizhou Fan et al. (2016) further guided the optimization process of path construction by collaborating conventional path-based framework of Mahajan and Bo Yan with useful feature points extracted from input frames. Integrating semantic information identifies critical pixels in input frames. It supervises the method for accurate motion pattern recognition via optimal energy minimization , thus avoiding wrong path selection and achieving more natural results.
  • 18. GAN-based interpolation • Inspired by the evident success of deep convolutional neural networks in video processing tasks and thorough research in the field of GANs by Goodfellow since 2014, the first GAN interpolation network designed by Mark Koren et al. (FINNiGAN [111]) in 2016 managed to cut traditional “ghosting” and “tearing” artifacts and compensated for fast-motion discrepancies. The general architecture of a GAN framework. Few developments have been discussed below. • They enhanced frame rate by utilizing the convolutional neural network framework collaborated with generative adversarial networks known as FINNiGAN. • The idea is to prevent common structural information loss during up-sampling by using a SIN (structure interpolation network) that produces the structure of the intermediate frame
  • 19. • ZheHu et al. (2018) [112] proposed amulti-scale structure to share parameters across various layers and cut costly global optimization, unlike usual flow-based methods. MSFSN (multi-scale frame synthesis network) became the first model to provide flexibility by parametrizing the temporal locus of the interest frame. • Chenguang Li et al. (2018) [113] exploited multi-scale CNN architecture to support the long-varied motion and employed additional WGAN-GP (Wasserstein generative adversarial network loss with gradient penalty) [114] to achieve more natural results. The model has a slim generator network structure requiring relatively less storage and high-speed processing due to residual structure and cumulative generation of high-resolution frames.
  • 20. Conclusion This paper puts forward a comprehensive survey of classic video frame interpolation techniques using deep learning. They present a broad overview of existing widely used benchmark evaluations. The five modalities exhibit their unique features and branch to diverse choices of deep learning techniques to utilize their properties productively. The inherent spatial, temporal, and structural attributes of a video sequence are identified. From the aspect of spatiotemporal structural encoding, we highlight the pros and cons of available techniques. Based on key insights underlined by the survey, the problem of video frame interpolation contains promising research opportunities. New techniques in deep learnin will enlarge the scope of improvement of frame interpolation. GAN-based training paradigms show a promising future in the field of frame interpolation. Combining frame interpolation with other video processing tasks also seems to interest researchers.