Paper discussion:Video-to-Video Synthesis (NIPS 2018)

•Download as PPTX, PDF•

0 likes•163 views

This presentation was used in Ridge-i Yomekai event in decemver 2018 for a NIPS2018 paper named Video-to-Video Synthesis delivered by researchers from Nvidia and MIT.

Technology

Paper discussion:
Video-to-Video Synthesis
(NIPS 2018)
Dec,21 2018Motaz Sabri

Abstract
▰Through GAN coupled with Spatiotemporal adversarial
objective, its possible to create a temporally coherent 30 seconds
videos with 2K resolution from segmentation masks, poses and
sketches.
▰Variety of datasets and applications were used for evaluation.
2

Outline
▰The problem.
Absence of temporal incoherence.
▰Previous efforts
State of the art methods and limitations
▰Proposed model.
Experiment
▰Conclusions
3

The problem
▰Video synthesis models aim to generate realistic videos without
specifying scene geometry, material dynamics or lightening.
▰Most of current models focus on textural information
▰Latest proposals generate videos that are short in duration and
contain many artifacts.
4

Previous efforts
Video to Video Synthesis
5
Pix2PixHD COVST
PredNet

Proposed model
▰Sequential generator:
▰The generation depends on three elements:
▰The current frame
▰Previous source frames
▰Last generated frames (Depth =2)
6

Introducing foreground/background to generator
▰Such division for hallucination network allows
generators to have specialties:
▰Background regions can be generated accurately.
▰Background hallucination network needs to construct occluded part.
▰Foreground (Comes with movement) is offered strong optical flow.
8

Multimodal synthesis support
▰Encode ground truth into 3-dimensional feature maps.
▰Apply average pooling in order to group pixels of the same object under the same
feature vector.
▰Feed all the average pooled features and the semantic masks to the generator.
▰Given different vectors, generator F can create objects with a variety of visual
appearances.
9

Experiments - Technical details:
Starting with few frames with low resolution to 30 seconds length
2k videos.
10
Epochs Optimizer Learning rate Batch Machine
40 Adam 0.0002 1 Video Nvidia DGX1

Experiments: Results -CityScapes Dataset
16
Fréchet Inception
dist
13D ResNeXt Human preference
score
Short seq Long Seq
Pix2PixHD 5.57 0.18 Vid2Vid/Pix2PixHD 0.87/0.13 0.83/0.17
COVST 5.55 0.18 Vid2Vid/COVST 0.84/0.16 0.80/0.20
Vid2Vid 4.66 0.15

Experiments: Results –Predictions and
evaluation
17
Human preference score
Vid2Vid /No background-foreground prior 0.80/0.20
Vid2Vid / No conditional video discriminator 0.84/0.16
Vid2Vid No flow Wrapping 0.67/0.33
Fréchet Inception dist 13D ResNeXt Human preference score Video Prediction
PredNet 11.18 0.59 Vid2Vid/PredNet 0.92/0.08
MCNet 10.00 0.43 Vid2Vid/MCNet 0.98/0.02
Vid2Vid 3.44 0.18

Thanks for your attention
Any questions?
19

What's hot

Sharpness-aware minimization (SAM)Sangwoo Mo

Alexnet paper review오 혜린

プログラミングコンテストでのデータ構造 2　～動的木編～Takuya Akiba

Deeplearning輪読会正志坪坂

Sparse fourier transformAarthi Raghavendra

はじめてのパターン認識第6章後半Prunus 1350

Pr083 Non-local Neural NetworksTaeoh Kim

深層学習入門Danushka Bollegala

[DL輪読会]StyleGAN-NADA: CLIP-Guided Domain Adaptation of Image GeneratorsDeep Learning JP

Disentangled Representation Learning of Deep Generative ModelsRyohei Suzuki

Conditional Image Generation with PixelCNN Decoderssuga93

Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recog...yukihiro domae

色々なダイクストラ高速化yosupo

Emily Denton - Unsupervised Learning of Disentangled Representations from Vid...Luba Elliott

Rate-Distortion Function for Gamma Sources under Absolute-Log Distortion奈良先端大情報科学研究科

Schönhage Strassen Algorithmcookies 146

PRML 4.1.6-4.2.2kazunori sakai

辺彩色Ken Ogura

深層生成モデルを用いたマルチモーダル学習Masahiro Suzuki

Domain AdaptationMark Chang

What's hot (20)

Sharpness-aware minimization (SAM)

Alexnet paper review

プログラミングコンテストでのデータ構造 2　～動的木編～

Deeplearning輪読会

Sparse fourier transform

はじめてのパターン認識第6章後半

Pr083 Non-local Neural Networks

深層学習入門

[DL輪読会]StyleGAN-NADA: CLIP-Guided Domain Adaptation of Image Generators

Disentangled Representation Learning of Deep Generative Models

Conditional Image Generation with PixelCNN Decoders

Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recog...

色々なダイクストラ高速化

Emily Denton - Unsupervised Learning of Disentangled Representations from Vid...

Rate-Distortion Function for Gamma Sources under Absolute-Log Distortion

Schönhage Strassen Algorithm

PRML 4.1.6-4.2.2

辺彩色

深層生成モデルを用いたマルチモーダル学習

Domain Adaptation

Similar to Paper discussion:Video-to-Video Synthesis (NIPS 2018)

[03 1][gpu용 개발자 도구 - parallel nsight 및 axe] miller axelaparuma

Cuda project paperKan-Han (John) Lu

IEEE 2014 MATLAB IMAGE PROCESSING PROJECTS Compressed domain video retargetingIEEEBEBTECHSTUDENTPROJECTS

Multi Processor Architecture for image processingideas2ignite

Patch-Based Image Learned Codec using Overlappingsipij

SDVIs and In-Situ Visualization on TACC's StampedeIntel® Software

Machine Learning approaches at video compression Roberto Iacoviello

imagefiltervhdl.pptxAkbarali206563

Gated-ViGATVasileiosMezaris

AWS_Re_invent_22_VNova.pdfV-Nova

Dataset creation for Deep Learning-based Geometric Computer Vision problemsPetteriTeikariPhD

2019-06-14:3 - Reti neurali e compressione videouninfoit

Introduction to Software Defined Visualization (SDVis)Intel® Software

YolactEdge Review [cdm]Dongmin Choi

Sticky Notes - a tool for supporting collaborative activities in a 3D virtual...Mikhail Fominykh

A Hybrid DWT-SVD Method for Digital Video Watermarking Using Random Frame Sel...researchinventy

Robust Video Watermarking Scheme Based on Intra-Coding Process in MPEG-2 Style IJECEIAES

Video Description using Deep LearningPranjalMahajan9

Efficient video perception through AIQualcomm Research

TMT SequenceL customer use cases and resultsDoug Norton

Similar to Paper discussion:Video-to-Video Synthesis (NIPS 2018) (20)

[03 1][gpu용 개발자 도구 - parallel nsight 및 axe] miller axe

Cuda project paper

IEEE 2014 MATLAB IMAGE PROCESSING PROJECTS Compressed domain video retargeting

Multi Processor Architecture for image processing

Patch-Based Image Learned Codec using Overlapping

SDVIs and In-Situ Visualization on TACC's Stampede

Machine Learning approaches at video compression

imagefiltervhdl.pptx

Gated-ViGAT

AWS_Re_invent_22_VNova.pdf

Dataset creation for Deep Learning-based Geometric Computer Vision problems

2019-06-14:3 - Reti neurali e compressione video

Introduction to Software Defined Visualization (SDVis)

YolactEdge Review [cdm]

Sticky Notes - a tool for supporting collaborative activities in a 3D virtual...

A Hybrid DWT-SVD Method for Digital Video Watermarking Using Random Frame Sel...

Robust Video Watermarking Scheme Based on Intra-Coding Process in MPEG-2 Style

Video Description using Deep Learning

Efficient video perception through AI

TMT SequenceL customer use cases and results

Recently uploaded

Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge

Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j

WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal

Slack Application Development 101 Slidespraypatel2

Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia

Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi

2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong

IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge

08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls

Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer

Histor y of HAM Radio presentation slidevu2urc

Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik

How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes

Boost PC performance: How more available memory can improve productivityPrincipled Technologies

The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los

Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700

Automating Google Workspace (GWS) & more with Apps Scriptwesley chun

From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software

Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK

Finology Group – Insurtech Innovation Award 2024The Digital Insurer

Recently uploaded (20)

Driving Behavioral Change for Information Management through Data-Driven Gree...

Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...

WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service

Slack Application Development 101 Slides

Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...

Top 5 Benefits OF Using Muvi Live Paywall For Live Streams

2024: Domino Containers - The Next Step. News from the Domino Container commu...

IAC 2024 - IA Fast Track to Search Focused AI Solutions

08448380779 Call Girls In Friends Colony Women Seeking Men

Axa Assurance Maroc - Insurer Innovation Award 2024

Histor y of HAM Radio presentation slide

Injustice - Developers Among Us (SciFiDevCon 2024)

How to Troubleshoot Apps for the Modern Connected Worker

Boost PC performance: How more available memory can improve productivity

The 7 Things I Know About Cyber Security After 25 Years | April 2024

Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...

Automating Google Workspace (GWS) & more with Apps Script

From Event to Action: Accelerate Your Decision Making with Real-Time Automation

Unblocking The Main Thread Solving ANRs and Frozen Frames

Finology Group – Insurtech Innovation Award 2024

Paper discussion:Video-to-Video Synthesis (NIPS 2018)

1. Paper discussion: Video-to-Video Synthesis (NIPS 2018) Dec,21 2018Motaz Sabri

2. Abstract ▰Through GAN coupled with Spatiotemporal adversarial objective, its possible to create a temporally coherent 30 seconds videos with 2K resolution from segmentation masks, poses and sketches. ▰Variety of datasets and applications were used for evaluation. 2

3. Outline ▰The problem. Absence of temporal incoherence. ▰Previous efforts State of the art methods and limitations ▰Proposed model. Experiment ▰Conclusions 3

4. The problem ▰Video synthesis models aim to generate realistic videos without specifying scene geometry, material dynamics or lightening. ▰Most of current models focus on textural information ▰Latest proposals generate videos that are short in duration and contain many artifacts. 4

5. Previous efforts Video to Video Synthesis 5 Pix2PixHD COVST PredNet

6. Proposed model ▰Sequential generator: ▰The generation depends on three elements: ▰The current frame ▰Previous source frames ▰Last generated frames (Depth =2) 6

7. Spatially and temporally progressing 7

8. Introducing foreground/background to generator ▰Such division for hallucination network allows generators to have specialties: ▰Background regions can be generated accurately. ▰Background hallucination network needs to construct occluded part. ▰Foreground (Comes with movement) is offered strong optical flow. 8

9. Multimodal synthesis support ▰Encode ground truth into 3-dimensional feature maps. ▰Apply average pooling in order to group pixels of the same object under the same feature vector. ▰Feed all the average pooled features and the semantic masks to the generator. ▰Given different vectors, generator F can create objects with a variety of visual appearances. 9

10. Experiments - Technical details: Starting with few frames with low resolution to 30 seconds length 2k videos. 10 Epochs Optimizer Learning rate Batch Machine 40 Adam 0.0002 1 Video Nvidia DGX1

11. Experiments: Samples –Synthesizing 11

12. Experiments: Samples –Synthesizing 12

13. Experiments: Samples –Edge Texturing 13

14. Experiments: Samples –Pose copy 14

15. Experiments: Samples –Prediction 15

16. Experiments: Results -CityScapes Dataset 16 Fréchet Inception dist 13D ResNeXt Human preference score Short seq Long Seq Pix2PixHD 5.57 0.18 Vid2Vid/Pix2PixHD 0.87/0.13 0.83/0.17 COVST 5.55 0.18 Vid2Vid/COVST 0.84/0.16 0.80/0.20 Vid2Vid 4.66 0.15

17. Experiments: Results –Predictions and evaluation 17 Human preference score Vid2Vid /No background-foreground prior 0.80/0.20 Vid2Vid / No conditional video discriminator 0.84/0.16 Vid2Vid No flow Wrapping 0.67/0.33 Fréchet Inception dist 13D ResNeXt Human preference score Video Prediction PredNet 11.18 0.59 Vid2Vid/PredNet 0.92/0.08 MCNet 10.00 0.43 Vid2Vid/MCNet 0.98/0.02 Vid2Vid 3.44 0.18

18. Conclusion 18

19. Thanks for your attention Any questions? 19

Paper discussion:Video-to-Video Synthesis (NIPS 2018)

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Paper discussion:Video-to-Video Synthesis (NIPS 2018)

Similar to Paper discussion:Video-to-Video Synthesis (NIPS 2018) (20)

Recently uploaded

Recently uploaded (20)

Paper discussion:Video-to-Video Synthesis (NIPS 2018)