Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

[2018 台灣人工智慧學校校友年會] 視訊畫面生成 / 林彥宇

405 views

Published on

林彥宇 / 中央研究院資訊科技創新研究中心副研究員

Published in: Data & Analytics
  • Be the first to comment

[2018 台灣人工智慧學校校友年會] 視訊畫面生成 / 林彥宇

  1. 1. Video Synthesis Yen-Yu Lin, Associate Research Fellow Research Center for IT Innovation, Academia Sinica 中央研究院 資訊科技創新研究中心 林彥宇 副研究員
  2. 2. • Yen-Yu Lin, Associate research fellow, CITI, Academia Sinica • Research interests: Computer Vision (CV): Let computers see, recognize, and interpret the world like humans Machine Learning (ML): Provide a statistical way to learn how human visual system works Goal: Design ML methods to facilitate CV applications About Yen-Yu Lin 2
  3. 3. Which video do you prefer? 3 Original Video 8X Video [Liu et al. AAAI’19]
  4. 4. Which video do you prefer? 4 Original Video 8X Video [Liu et al. AAAI’19]
  5. 5. Outline • Introduction • Related Work • Our Idea and Approach • Experimental Results • Conclusions 5
  6. 6. Video frame interpolation • Video interpolation produces videos of higher frame rates  Problem formulation: Predict the intermediate frame between two consecutive frames 6 Video 1xVideo 2x ?
  7. 7. Why video interpolation • High frame rate videos have temporally coherent content and smooth view transition • Acquiring such videos leads to higher power consumption and more storage requirement • Video interpolation compromises user experience and acquiring cost 7
  8. 8. Outline • Introduction • Related Work • Our Idea and Approach • Experimental Results • Conclusions 8
  9. 9. Related work • Video frame interpolation  Conventional (non deep learning based) methods  CNN-based methods • Predict the optical flow • Predict the intermediate frame 9
  10. 10. Related work • Video frame interpolation  Conventional (non deep learning based) methods • Dense motion correspondences -> optical flow • Optimize complex objective function • ✗ time-consuming • ✗ computationally expensive  CNN-based methods • Predict the optical flow • Predict the intermediate frame 10
  11. 11. Optical flow 11 www.commonvisionblox.com
  12. 12. Related work • Video frame interpolation  Conventional (non deep learning based) methods  CNN-based methods • Predict the optical flow based on FlowNet • ✗ Hard to get the supervised data 12 [Dosovitskiy et al. ICCV’15]
  13. 13. Related work • Video frame interpolation  Conventional (non deep learning based) methods  CNN-based methods • Predict the intermediate frame, e.g., Deep Voxel Flow (DVF) • ✓ More efficient and pleasing results 13 [Liu et al. ICCV’17]
  14. 14. Outline • Introduction • Related Work • Our Idea and Approach • Experimental Results • Conclusions 14
  15. 15. CNN-based methods for intermediate frame prediction • The problems: artifacts and over-smoothed results 15
  16. 16. Our idea: Cycle consistency checking • Observation: Over-smoothed frames or frames with artifacts cannot well reconstruct the original frames 16
  17. 17. A two-stage training procedure • Our method is developed upon DVF [Liu et al. ICCV’17] • Stage 1: Pre-train the DVF 17 • fully convolutional • encoder + decoder • skip connections U-Net
  18. 18. A two-stage training procedure • Stage 2: Include the cycle consistency loss  Duplicate the learned DVF three times  Compute the reconstruction error for cycle consistence checking  Fine-tune all DVF models 18
  19. 19. Network architecture 19
  20. 20. 20 Input DVF DVF + Cycle Loss
  21. 21. Motion linearity loss • Motion linearity loss: Assume that the interval between two frames is short enough so that the motion between them is linear 21
  22. 22. Edge-guided training • Edge-guided training: Interpolation on highly textured regions is difficult. Hence, the edge maps are added to the input for edge preserving. 22
  23. 23. Outline • Introduction • Related Work • Our Idea and Approach • Experimental Results • Conclusions 23
  24. 24. Experimental results: Ablation studies on UCF dataset 24 Input (a) Ground truth (b) Baseline (DVF) (c) + Cycle (d) + Cycle + Motion (e) + Cycle + Edge (f) Full model
  25. 25. Experimental results: Ablation studies on UCF dataset • Cycle loss makes our model robust to the lack of training data 25 34 35 36 37 280000 28000 2800 280 PSNR(dB) Data size (number of triplets) UCF101 testing set w/o cycle + motion w/ cycle + motion 39.69 39.16 38.18 35.75 40.6 40.47 39.88 38.13 35 37 39 41 280000 28000 2800 280 PSNR(dB) Data size (number of triplets) Video: "See You Again" w/o cycle + motion w/ cycle + motion
  26. 26. Experimental results: Comparison with SoTA methods • On the UCF-101 dataset • On the Middleburry dataset 26
  27. 27. Experimental results: Demo videos 27 1X 8X
  28. 28. Outline • Introduction • Related Work • Our Idea and Approach • Experimental Results • Conclusions 28
  29. 29. Conclusion remarks • We present a novel loss called cycle consistency loss  Can work with existing methods and still end-to-end trainable  Better synthesis results and robust to less training data • Two extensions: motion linearity loss and edge-guided training  Regularize the training procedure  Further improve the performance • Future plans:  Interpolation -> Extrapolation (video prediction)  Temporal -> Spatio-temporal (super-resolution + video interpolation)  Adversarial learning 29
  30. 30. Reference 30 Deep Video Frame Interpolation using Cyclic Frame Generation AAAI 2019 劉育綸 廖苡彤 林彥宇 莊永裕
  31. 31. 31 Thank You for Your Attention! Yen-Yu Lin (林彥宇) Email: yylin@citi.sinica.edu.tw URL: http://cvlab.citi.sinica.edu.tw/

×