[Paper Review] Temporal Generative Adversarial Nets with Singular Value Clipping

•Download as PPTX, PDF•

0 likes•159 views

Hyeongmin Lee

Paper review for "Temporal Generative Adversarial Nets with Singular Value Clipping"

Engineering

Temporal Generative Adversarial Nets
with Singular Value Clipping
작은논문읽기 모임 2018-2-2nd
영상 및 비디오 패턴 인식 연구실 이형민

연구 분야
• Video Generation
 Frame Interpolation
 Frame Extrapolation(Future Frame Prediction)
 Image Animation
• Video-Sound Fusion
 Sound of Pixel
 Cocktail Party Effect
• Simple Video Processing
 Image Processing  Video
공통점: Video, Time Axis

GAN
Generator Discriminator
Generator Discriminator

GAN with Temporal Data?
Spatial 축과 Temporal 축이 전부 동일하게 취급된다!!

Temporal GAN
• 3d Convolution
• (channel, time, height, width)

Future Research Ideas
• Frame Interpolation
논문은 많이 읽었는데 아이디어 구체화가 안됨
• Text-Guided Image Animation
Model의 창의와 자유도를 제한하면서 인간이 원하는 방향을 제시하는 방식에 흥미가 생겼음.
• Motion Deblurring using Frame Interpolation
Video는 자연이 Labeling 해준 천연 Dataset
Video Frame Rate를 높여서 인위적으로 Motion Blurred Image를 형성한 뒤, 역으로 학습시키는 아이디어

이번 영상에서는 제가 PR 278번째로 소개드린 적 있었던 RAFT의 Point Tracking 버전 논문입니다. 보통 Object Traking은 주어진 bounding box를 track하는 task를 말하는데 본 논문에서는 첫 프레임에 주어진 point를 따라가는 task를 다루고 있습니다. 논문 제목에서 이야기 하듯이, 주어진 point 하나를 따라가는 것보다 여러 point를 함께 따라가면서 서로 정보를 주고받는 등의 interaction을 하는 것이 tracking 성능 향상에 도움이 된다는 것이 이 논문의 main idea입니다. 논문 링크: https://arxiv.org/abs/2307.07635 영상 링크: https://youtu.be/BDfTSm3_hys

PR-430: CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retri...

Hyeongmin Lee

This document summarizes research on using CLIP to perform end-to-end video clip retrieval. It presents CLIP4Clip, which uses a CLIP backbone pretrained on large image-text datasets to encode video clips and text queries into a shared embedding space. CLIP4Clip flattens patches from a video encoder into vectors and calculates similarity between video and text embeddings for retrieval. It is trained on HowTo100M video clips and outperforms prior work on benchmark datasets like MSR-VTT, achieving state-of-the-art video clip retrieval results.

PR-420: Scalable Model Compression by Entropy Penalized Reparameterization

Hyeongmin Lee

제가 이번에 소개드릴 논문은 Scalable Model Compression by Entropy Penalized Reparameterization이라는 논문입니다. 이전에 꾸준히 Deep Learning을 이용한 이미지 및 비디오 압축에 대해 설명드렸던 바가 있는데, 이번에는 Neural Network의 Model Parameter들을 압축하는 방법에 관한 논문입니다. 논문 링크: https://arxiv.org/abs/1906.06624 영상 링크: https://youtu.be/LJ8WD5MKA2o

PR-409: Denoising Diffusion Probabilistic Models

Hyeongmin Lee

이번 논문은 요즘 핫한 Diffusion을 처음으로 유행시킨 Denoising Diffusion Probabilistic Models (DDPM) 입니다. ICML 2015년에 처음 제안된 Diffusion의 여러 실용적인 측면들을 멋지게 해결하여 그 유행의 시작을 알린 논문인데요, Generative Model의 여러 분야와 Diffusion, 그리고 DDPM에서는 무엇이 바뀌었는지 알아보도록 하겠습니다. 논문 링크: https://arxiv.org/abs/2006.11239 영상 링크: https://youtu.be/1j0W_lu55nc

PR-395: Variational Image Compression with a Scale Hyperprior

Hyeongmin Lee

제가 이번에 소개드릴 논문은 Variational Image Compression with a Scale Hyperprior라는 논문입니다. 지난 328번째 발표에 이어서 두번째 Deep Learning-based Image Compression이고, 지난번 발표때 다루지 못했던 Variational Autoencoder와의 관계와 이번 논문에서의 새 Contribution까지, Deep Learning을 이용한 Image Compression연구는 어떤 고민을 주로 하고 있는지 등을 전달해드리고자 노력하였습니다. 논문 링크: https://arxiv.org/abs/1802.01436 영상 링크: https://youtu.be/ne9ieHRsfCc

제가 이번에 소개드릴 논문은 NeRF와 같이 view synthesis를 하는 논문입니다. NeRF 이후로 NeRF의 문제점을 보완하기 위해 여러 방법들이 쏟아져 나왔는데요, 다른 한편으로는 발상의 전환을 통해 NeRF와 다른 방법을 활용하고자 하는 시도들도 있는 편입니다. 그러한 가장 대표적인 방법중 하나인 Neural Light Field Rendering 방식에 대해 설명드리겠습니다. 논문 링크: https://arxiv.org/abs/2106.02634 영상 링크: https://youtu.be/gxag8uvA2Sc

PR-376: Softmax Splatting for Video Frame Interpolation

Hyeongmin Lee

This document proposes a method called softmax splatting for video frame interpolation. It summarizes previous approaches like averaging frames and using optical flow. Softmax splatting uses optical flow to warp input frames and applies a softmax function to interpolate pixel values, assigning higher weights to pixels with smaller displacement. This allows pixels to be interpolated from multiple locations instead of just their forward flow mapping. The method uses a neural network to estimate optical flow and perform softmax splatting for high quality frame interpolation between input video frames.

PR-365: Fast object detection in compressed video

Hyeongmin Lee

이번 PR12 365번째 논문으로 소개드릴 내용은 조금 특이한 접근법입니다. 우리가 실생활에서 접하는 대부분의 비디오는 Compressed 된 형태의 Video인데요, 실제 Computer Vision Task에서 input이 Compressed Video라는 가정을 하게 되면 생각보다 큰 이점을 얻을 수 있습니다. 바로 Compressed Video에는 Motion Vector가 포함되어있다는 점입니다. 이를 이용하면 생각보다 많은 것들을 할 수 있게 됩니다. 그 예시로 Object Detection의 연산량을 크게 줄인 case를 하나 소개드려보고자 합니다. paper link: https://openaccess.thecvf.com/content_ICCV_2019/html/Wang_Fast_Object_Detection_in_Compressed_Video_ICCV_2019_paper.html video link: https://youtu.be/9n6OtHtJvJ0

PR-340: DVC: An End-to-end Deep Video Compression Framework

Hyeongmin Lee

이번 PR12 340번째 논문으로 소개드릴 내용은 Deep Learning을 이용한 Video Compression에 관한 내용입니다. 바로 이전 논문으로 Deep Learning을 이용한 Image Compression에 대해 설명드렸었는데요, 시간 여유가 있으신 분들께서는 이전 영상 먼저 보시고 오는 것을 추천드립니다 :) 이전 영상: https://www.youtube.com/watch?v=rtuJqQDWmIA paper link: https://arxiv.org/abs/1812.00101 youtube link: https://youtu.be/Dd8Gj2ZITkA

PR-328: End-to-End OptimizedImage Compression

Hyeongmin Lee

PR 328번째 논문은 ICLR 2017에 발표된 "End-to-End OptimizedImage Compression"이라는 논문입니다. 이미지 압축에 대해 들어보신 적이 있으신가요? 이미지를 더 적은 비트, 즉 더 적은 용량의 데이터로 표현하기 위해 다양한 압축 방법이 제안되어 왔습니다. 가장 대표적인 기술이 JPEG이라고 할 수 있겠는데요, 이 논문에서는 End-to-End Deep Learning을 이용하여 이미지를 압축하는 기법을 제안합니다. 이 논문에서 제안한 방법과 더불어 이미지 압축에 필요한 기본 개념들까지 함께 정리하였으니 이미지 압축이라는 분야가 단순히 무엇인지 궁금하신 분들께서도 앞에서부터 차근차근 봐주시면 감사드리겠습니다 :) paper link: https://arxiv.org/abs/1611.01704 youtube link: https://youtu.be/rtuJqQDWmIA

PR-315: Taming Transformers for High-Resolution Image Synthesis

Hyeongmin Lee

요즘 Transformer 구조를 language랑 vision 관계 없이 여기저기 적용해보려는 시도가 매우 다양하게 이루어지고 있는데요, 그래서 이번주 제 발표에서는 이를 High-resolution image synthesis에 활용한, CVPR 2021 Oral Session에서 발표될 논문 하나를 소개해보려고 합니다! ** 방송 기기 문제로 이번 영상은 아이패드 필기 없이 진행됩니다!! ** 논문 링크: https://arxiv.org/abs/2012.09841 영상 링크: https://youtu.be/GcbT0IGt0xE

PR-302: NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis

Hyeongmin Lee

드디어 PR12 Season 4가 시작되었습니다! 제가 이번 시즌에서 발표하게 된 첫 논문은 ""NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis"라는 논문입니다. View Synthesis라는 Task는 몇 개의 시점에서 대상을 찍은 영상이 주어지면 주어지지 않은 위치와 방향에서 바라본 대상의 영상을 합성해내는 기술입니다. 이를 위해서 본 논문에서는 대상의 3D 정보를 통째로 Neural Network가 외우게 하는 방법을 선택했는데요, 이 방식은 Implicit Neural Representation이라는 이름으로 유명해지고 있는 추세고, 2D 이미지에 대해서도 적용하려는 접근들이 늘고 있습니다. 영상 링크: https://youtu.be/zkeh7Tt9tYQ 논문 링크: https://arxiv.org/abs/2003.08934

PR-278: RAFT: Recurrent All-Pairs Field Transforms for Optical Flow

Hyeongmin Lee

Pr266

Hyeongmin Lee

이번에 다룰 논문은 "Learning by Analogy: Reliable Supervision From Transformations for Unsupervised Optical Flow Estimation"이라는 논문입니다. 얼마 전에 발표드렸던 FlowNet 논문처럼 이 논문도 Deep Learning을 통해 Optical Flow를 학습하는 방법입니다. 다른 점이 하나 있다면, Unsupervised 방식으로 학습이 진행된다는 점입니다. Supervised 방식 만큼이나 Unsupervised 방식으로 Optical Flow를 학습하는 연구 역시 이미 많이 진행이 되어 왔는데요, 오늘 소개 드릴 논문에서는 Data Augmentation을 통한 Consistency를 활용하여 성능을 높이는 방식을 채용한 경우를 소개드리고자 합니다. 영상 링크: 이번에 다룰 논문은 "Learning by Analogy: Reliable Supervision From Transformations for Unsupervised Optical Flow Estimation"이라는 논문입니다. 얼마 전에 발표드렸던 FlowNet 논문처럼 이 논문도 Deep Learning을 통해 Optical Flow를 학습하는 방법입니다. 다른 점이 하나 있다면, Unsupervised 방식으로 학습이 진행된다는 점입니다. Supervised 방식 만큼이나 Unsupervised 방식으로 Optical Flow를 학습하는 연구 역시 이미 많이 진행이 되어 왔는데요, 오늘 소개 드릴 논문에서는 Data Augmentation을 통한 Consistency를 활용하여 성능을 높이는 방식을 채용한 경우를 소개드리고자 합니다.

PR-252: Making Convolutional Networks Shift-Invariant Again

Hyeongmin Lee

PR-240: Modulating Image Restoration with Continual Levels viaAdaptive Featu...

Hyeongmin Lee

PR-228: Geonet: Unsupervised learning of dense depth, optical flow and camera...

Hyeongmin Lee

PR-214: FlowNet: Learning Optical Flow with Convolutional Networks

Hyeongmin Lee

제 PR12 첫번째 발표 논문은 FlowNet이라는 논문입니다. Optical Flow는 비디오의 인접한 Frame에 대하여 각 Pixel이 첫 번째 Frame에서 두 번째 Frame으로 얼마나 이동했는지의 Vector를 모든 위치에 대하여 나타낸 Map입니다. Video에 Motion을 분석하는 일은 매우 중요하기 때문에, 이러한 Optical Flow 역시 굉장히 중요한 요소 중 하나인데요, 이번 영상에서는 고전적인 Computer Vision에서 쓰였던 다양한 Optical Flow 알고리즘들과, Deep Learning Based로 Optical Flow를 구하는 Neural Network인 FlowNet에 대하여 알아보겠습니다. 감사합니다!! 영상 링크: https://youtu.be/Z_t0shK98pM 논문 링크: http://openaccess.thecvf.com/content_iccv_2015/html/Dosovitskiy_FlowNet_Learning_Optical_ICCV_2015_paper.html

[PR12] Making Convolutional Networks Shift-Invariant Again

Hyeongmin Lee

This document discusses anti-aliasing techniques for convolutional neural networks to improve shift-invariance. It first explains the concept of shift-invariance and how aliasing can occur from operations like max pooling and strided convolutions, making networks shift-variant. It then proposes applying anti-aliasing by blurring feature maps before pooling or downsampling to remove high-frequency components and make the representations more shift-equivariant and ultimately shift-invariant. Experimental results show this anti-aliasing approach improves consistency, accuracy, and performance on image translation tasks.

[Paper Review] A Middlebury Benchmark & Context-Aware Synthesis for Video Fra...

Hyeongmin Lee

[Paper Review] A spatio -Temporal Descriptor Based on 3D -Gradients

Hyeongmin Lee

This document describes a spatio-temporal descriptor for extracting features from video. It extends the traditional Histogram of Oriented Gradients (HoG) feature used for images to the video domain. Pixels are grouped into cuboids spanning space and time, and a gradient-based descriptor is calculated for each cuboid by projecting pixel gradients onto a set of basis vectors, thresholding, and normalization. These cuboid descriptors are then aggregated into blocks to form the final video feature representation.

[Paper Review] Unmasking the abnormal events in video

Hyeongmin Lee

GAN with Mathematics

Hyeongmin Lee

“Toward Principled Methods for Training GANs, ICLR 2017, 172회 인용”은 Ian Goodfellow의 GAN에 대해서 근본적인 문제점을 제기합니다. 우리는 그냥 이미지를 잘 생성해 주니까 GAN을 사용하는데, 그 원리에 대해서 깊게 생각해 본 적은 없습니다. Generator와 Discriminator의 수렴에 대해서 관심을 가져본 적도 없죠. 이에 대해서 Distance부터 시작해서 수학적으로 질문을 던지는 논문입니다. 결국엔 Data Distribution이 확률 분포로써 어떻게 작용하는지에 대해서 살펴보고자 합니다. 물론 이에 대한 Solution을 제공한 것은 아닙니다만. 이러한 문제 제기는 GAN의 History에서 아주 큰 흐름을 가져왔습니다 - GAN에 대한 흔한 오해 - Kullback Leibler Divergence와 Jensen Shannon Divergence - GAN 알고리즘의 수학적 분석 - GAN을 Training하는 과정에서 발생하는 치명적인 문제점 - 문제점을 해결하기 위한 시도들 - GAN 테크트리: 그래서 무슨 GAN을 사용할까

[Paper Review] Image captioning with semantic attention

Hyeongmin Lee

이번 세미나에서는 Quanzeng You의 CVPR 논문인 [Image Captioning with Semantic Attention]에 대한 리뷰를 하려고 합니다. Image Captioning은 인공지능 학계의 거대한 두 흐름인 ‘Computer Vision(컴퓨터 비전)’과 ‘Natural Language Processing(자연어 처리)’를 연결하는, 매우 중요한 의의를 갖는 연구 분야입니다. Image Captioning의 접근 방식은 크게 ‘Top-Down Approach’와 ‘Bottom-Up Approach’로 구분됩니다. Top-Down Approach에서는 이미지를 통째로 시스템에 통과 시켜서 얻은 ‘요점’을 언어로 변환하는 반면 Bottom-Up Approach에서는 이미지의 다양한 부분들로부터 단어들을 도출해내고, 이를 결합하여 문장을 얻어냅니다. 논문에서는 이러한 Top-Down Approach와 Bottom-Up Approach의 장점을 합해서 Image Captioning 성능을 올리고자 합니다. 이 때 사용되는 개념이 바로 Visual Attention입니다. Visual Attention은 말 그대로 이미지의 특정 부분에 집중하는 것입니다. 사람이 이미지의 모든 내용을 전부 묘사하지 않는 것처럼, 컴퓨터도 이미지에서 특히 중요한 부분에 자원을 집중하는 형태의 노력이 필요합니다. Visual Attention을 통해서 컴퓨터는 이미지의 특히 중요한 부분에 집중하고, 더 자세히 묘사하게 됩니다. 본 세미나에서는 Visual Attention이 어떠한 방식으로 Image Captioning에 적용되는지 살펴보도록 하겠습니다.

Git command

Hyeongmin Lee

More from Hyeongmin Lee

PR-386: Light Field Networks: Neural Scene Representations with Single-Evalua...

Hyeongmin Lee

PR-376: Softmax Splatting for Video Frame Interpolation

Hyeongmin Lee

PR-365: Fast object detection in compressed video

Hyeongmin Lee

PR-340: DVC: An End-to-end Deep Video Compression Framework

Hyeongmin Lee

PR-328: End-to-End OptimizedImage Compression

Hyeongmin Lee

PR-315: Taming Transformers for High-Resolution Image Synthesis

Hyeongmin Lee

PR-302: NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis

Hyeongmin Lee

PR-278: RAFT: Recurrent All-Pairs Field Transforms for Optical Flow

Hyeongmin Lee

Pr266

Hyeongmin Lee

PR-252: Making Convolutional Networks Shift-Invariant Again

Hyeongmin Lee

PR-240: Modulating Image Restoration with Continual Levels viaAdaptive Featu...

Hyeongmin Lee

PR-228: Geonet: Unsupervised learning of dense depth, optical flow and camera...

Hyeongmin Lee

PR-214: FlowNet: Learning Optical Flow with Convolutional Networks

Hyeongmin Lee

[PR12] Making Convolutional Networks Shift-Invariant Again

Hyeongmin Lee

[Paper Review] A Middlebury Benchmark & Context-Aware Synthesis for Video Fra...

Hyeongmin Lee

[Paper Review] A spatio -Temporal Descriptor Based on 3D -Gradients

Hyeongmin Lee

[Paper Review] Unmasking the abnormal events in video

Hyeongmin Lee

GAN with Mathematics

Hyeongmin Lee

[Paper Review] Image captioning with semantic attention

Hyeongmin Lee

Git command

Hyeongmin Lee

More from Hyeongmin Lee (20)

PR-386: Light Field Networks: Neural Scene Representations with Single-Evalua...

PR-376: Softmax Splatting for Video Frame Interpolation

PR-365: Fast object detection in compressed video

PR-340: DVC: An End-to-end Deep Video Compression Framework

PR-328: End-to-End OptimizedImage Compression

PR-315: Taming Transformers for High-Resolution Image Synthesis

PR-302: NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis

PR-278: RAFT: Recurrent All-Pairs Field Transforms for Optical Flow

Pr266

PR-252: Making Convolutional Networks Shift-Invariant Again

PR-240: Modulating Image Restoration with Continual Levels viaAdaptive Featu...

PR-228: Geonet: Unsupervised learning of dense depth, optical flow and camera...

PR-214: FlowNet: Learning Optical Flow with Convolutional Networks

[PR12] Making Convolutional Networks Shift-Invariant Again

[Paper Review] A Middlebury Benchmark & Context-Aware Synthesis for Video Fra...

[Paper Review] A spatio -Temporal Descriptor Based on 3D -Gradients

[Paper Review] Unmasking the abnormal events in video

GAN with Mathematics

[Paper Review] Image captioning with semantic attention

Git command

[Paper Review] Temporal Generative Adversarial Nets with Singular Value Clipping

1. Temporal Generative Adversarial Nets with Singular Value Clipping 작은논문읽기 모임 2018-2-2nd 영상 및 비디오 패턴 인식 연구실 이형민

2. 연구 분야 • Video Generation  Frame Interpolation  Frame Extrapolation(Future Frame Prediction)  Image Animation • Video-Sound Fusion  Sound of Pixel  Cocktail Party Effect • Simple Video Processing  Image Processing  Video 공통점: Video, Time Axis

3. GAN Generator Discriminator Generator Discriminator

4. GAN with Temporal Data? Image Video

5. GAN with Temporal Data? Spatial 축과 Temporal 축이 전부 동일하게 취급된다!!

6. Temporal GAN

7. Temporal GAN • 3d Convolution • (channel, time, height, width)

8. Future Research Ideas • Frame Interpolation 논문은 많이 읽었는데 아이디어 구체화가 안됨 • Text-Guided Image Animation Model의 창의와 자유도를 제한하면서 인간이 원하는 방향을 제시하는 방식에 흥미가 생겼음. • Motion Deblurring using Frame Interpolation Video는 자연이 Labeling 해준 천연 Dataset Video Frame Rate를 높여서 인위적으로 Motion Blurred Image를 형성한 뒤, 역으로 학습시키는 아이디어

[Paper Review] Temporal Generative Adversarial Nets with Singular Value Clipping

Recommended

Recommended

More Related Content

More from Hyeongmin Lee

More from Hyeongmin Lee (20)

[Paper Review] Temporal Generative Adversarial Nets with Singular Value Clipping