This presentation was used in Ridge-i Yomekai event in decemver 2018 for a NIPS2018 paper named Video-to-Video Synthesis delivered by researchers from Nvidia and MIT.
2. Abstract
▰Through GAN coupled with Spatiotemporal adversarial
objective, its possible to create a temporally coherent 30 seconds
videos with 2K resolution from segmentation masks, poses and
sketches.
▰Variety of datasets and applications were used for evaluation.
2
3. Outline
▰The problem.
Absence of temporal incoherence.
▰Previous efforts
State of the art methods and limitations
▰Proposed model.
Experiment
▰Conclusions
3
4. The problem
▰Video synthesis models aim to generate realistic videos without
specifying scene geometry, material dynamics or lightening.
▰Most of current models focus on textural information
▰Latest proposals generate videos that are short in duration and
contain many artifacts.
4
8. Introducing foreground/background to generator
▰Such division for hallucination network allows
generators to have specialties:
▰Background regions can be generated accurately.
▰Background hallucination network needs to construct occluded part.
▰Foreground (Comes with movement) is offered strong optical flow.
8
9. Multimodal synthesis support
▰Encode ground truth into 3-dimensional feature maps.
▰Apply average pooling in order to group pixels of the same object under the same
feature vector.
▰Feed all the average pooled features and the semantic masks to the generator.
▰Given different vectors, generator F can create objects with a variety of visual
appearances.
9
10. Experiments - Technical details:
Starting with few frames with low resolution to 30 seconds length
2k videos.
10
Epochs Optimizer Learning rate Batch Machine
40 Adam 0.0002 1 Video Nvidia DGX1