[2018 台灣人工智慧學校校友年會] 視訊畫面生成 / 林彥宇

Video Synthesis
Yen-Yu Lin, Associate Research Fellow
Research Center for IT Innovation, Academia Sinica
中央研究院資訊科技創新研究中心
林彥宇副研究員

• Yen-Yu Lin, Associate research fellow, CITI, Academia Sinica
• Research interests:
Computer Vision (CV):
Let computers see, recognize, and interpret the world like humans
Machine Learning (ML):
Provide a statistical way to learn how human visual system works
Goal: Design ML methods to facilitate CV applications
About Yen-Yu Lin
2

Which video do you prefer?
3
Original Video
8X Video
[Liu et al.
AAAI’19]

Which video do you prefer?
4
Original Video
8X Video
[Liu et al.
AAAI’19]

Outline
• Introduction
• Related Work
• Our Idea and Approach
• Experimental Results
• Conclusions
5

Video frame interpolation
• Video interpolation produces videos of higher frame rates
 Problem formulation: Predict the intermediate frame between
two consecutive frames
6
Video 1xVideo 2x
?

Why video interpolation
• High frame rate videos have temporally coherent content and
smooth view transition
• Acquiring such videos leads to higher power consumption and
more storage requirement
• Video interpolation compromises user experience and
acquiring cost
7

Outline
• Introduction
• Related Work
• Conclusions
8

Related work
• Video frame interpolation
 Conventional (non deep learning based) methods
 CNN-based methods
• Predict the optical flow
• Predict the intermediate frame
9

Related work
• Dense motion correspondences -> optical flow
• Optimize complex objective function
• ✗ time-consuming
• ✗ computationally expensive
• Predict the optical flow
• Predict the intermediate frame
10

Optical flow
11
www.commonvisionblox.com

Related work
• Predict the optical flow based on FlowNet
• ✗ Hard to get the supervised data
12
[Dosovitskiy et al. ICCV’15]

Related work
• Predict the intermediate frame, e.g., Deep Voxel Flow (DVF)
• ✓ More efficient and pleasing results
13
[Liu et al. ICCV’17]

Outline
• Introduction
• Related Work
• Conclusions
14

CNN-based methods for intermediate frame prediction
• The problems: artifacts and over-smoothed results
15

Our idea: Cycle consistency checking
• Observation: Over-smoothed frames or frames with artifacts
cannot well reconstruct the original frames
16

A two-stage training procedure
• Our method is developed upon DVF [Liu et al. ICCV’17]
• Stage 1: Pre-train the DVF
17
• fully convolutional
• encoder + decoder
• skip connections
U-Net

A two-stage training procedure
• Stage 2: Include the cycle consistency loss
 Duplicate the learned DVF three times
 Compute the reconstruction error for cycle consistence checking
 Fine-tune all DVF models
18

Motion linearity loss
• Motion linearity loss: Assume that the interval between two frames
is short enough so that the motion between them is linear
21

Edge-guided training
• Edge-guided training: Interpolation on highly textured regions is
difficult. Hence, the edge maps are added to the input for edge
preserving.
22

Outline
• Introduction
• Related Work
• Conclusions
23

Experimental results: Ablation studies on UCF dataset
24
Input (a)
Ground
truth
(b)
Baseline
(DVF)
(c)
+ Cycle
(d)
+ Cycle
+ Motion
(e)
+ Cycle
+ Edge
(f)
Full
model

Experimental results: Ablation studies on UCF dataset
• Cycle loss makes our model robust to the lack of training data
25
34
35
36
37
280000 28000 2800 280
PSNR(dB)
Data size (number of triplets)
UCF101 testing set
w/o cycle + motion w/ cycle + motion
39.69
39.16
38.18
35.75
40.6 40.47
39.88
38.13
35
37
39
41
280000 28000 2800 280
PSNR(dB)
Data size (number of triplets)
Video: "See You Again"
w/o cycle + motion w/ cycle + motion

Experimental results: Comparison with SoTA methods
• On the UCF-101 dataset
• On the Middleburry dataset
26

Experimental results: Demo videos
27
1X
8X

Outline
• Introduction
• Related Work
• Conclusions
28

Conclusion remarks
• We present a novel loss called cycle consistency loss
 Can work with existing methods and still end-to-end trainable
 Better synthesis results and robust to less training data
• Two extensions: motion linearity loss and edge-guided training
 Regularize the training procedure
 Further improve the performance
• Future plans:
 Interpolation -> Extrapolation (video prediction)
 Temporal -> Spatio-temporal (super-resolution + video
interpolation)
 Adversarial learning
29

Reference
30
Deep Video Frame Interpolation using
Cyclic Frame Generation
AAAI 2019
劉育綸廖苡彤林彥宇莊永裕

31
Thank You for Your Attention!
Yen-Yu Lin (林彥宇)
Email: yylin@citi.sinica.edu.tw
URL: http://cvlab.citi.sinica.edu.tw/

[2018 台灣人工智慧學校校友年會] 視訊畫面生成 / 林彥宇

Recommended

Recommended

More Related Content

Similar to [2018 台灣人工智慧學校校友年會] 視訊畫面生成 / 林彥宇

Similar to [2018 台灣人工智慧學校校友年會] 視訊畫面生成 / 林彥宇 (20)

More from 台灣資料科學年會

More from 台灣資料科學年會 (20)

Recently uploaded

Recently uploaded (20)

[2018 台灣人工智慧學校校友年會] 視訊畫面生成 / 林彥宇