文献紹介:SlowFast Networks for Video RecognitionToru Tamaki
Christoph Feichtenhofer, Haoqi Fan, Jitendra Malik, Kaiming He, SlowFast Networks for Video Recognition, Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2019, pp. 6202-6211
https://openaccess.thecvf.com/content_ICCV_2019/html/Feichtenhofer_SlowFast_Networks_for_Video_Recognition_ICCV_2019_paper.html
[DL輪読会]A Style-Based Generator Architecture for Generative Adversarial NetworksDeep Learning JP
This document discusses style-based generative adversarial networks and techniques used in them. It introduces adaptive instance normalization (AdaIN) which aligns the mean and variance of features to match a target style. It also discusses mixing regularization which combines styles at the latent space level and perceptual path length which measures diversity of generated images.
文献紹介:SlowFast Networks for Video RecognitionToru Tamaki
Christoph Feichtenhofer, Haoqi Fan, Jitendra Malik, Kaiming He, SlowFast Networks for Video Recognition, Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2019, pp. 6202-6211
https://openaccess.thecvf.com/content_ICCV_2019/html/Feichtenhofer_SlowFast_Networks_for_Video_Recognition_ICCV_2019_paper.html
[DL輪読会]A Style-Based Generator Architecture for Generative Adversarial NetworksDeep Learning JP
This document discusses style-based generative adversarial networks and techniques used in them. It introduces adaptive instance normalization (AdaIN) which aligns the mean and variance of features to match a target style. It also discusses mixing regularization which combines styles at the latent space level and perceptual path length which measures diversity of generated images.
[DL輪読会]Large Scale GAN Training for High Fidelity Natural Image SynthesisDeep Learning JP
This document summarizes research on improving the fidelity of natural image synthesis using generative adversarial networks (GANs). The key points discussed include:
1. Using hierarchical latent spaces and shared embeddings to allow controlling image synthesis at different levels of detail.
2. Applying a "truncation trick" during sampling to generate higher quality images by truncating the latent distribution.
3. Adding orthogonal regularization to the generator to improve training stability and image quality.
Mask R-CNN extends Faster R-CNN by adding a branch for predicting segmentation masks in parallel with bounding box recognition and classification. It introduces a new layer called RoIAlign to address misalignment issues in the RoIPool layer of Faster R-CNN. RoIAlign improves mask accuracy by 10-50% by removing quantization and properly aligning extracted features. Mask R-CNN runs at 5fps with only a small overhead compared to Faster R-CNN.
This document summarizes recent developments in action recognition using deep learning techniques. It discusses early approaches using improved dense trajectories and two-stream convolutional neural networks. It then focuses on advances using 3D convolutional networks, enabled by large video datasets like Kinetics. State-of-the-art results are achieved using inflated 3D convolutional networks and temporal aggregation methods like temporal linear encoding. The document provides an overview of popular datasets and challenges and concludes with tips on training models at scale.
Disentangled Representation Learning of Deep Generative ModelsRyohei Suzuki
This document discusses disentangled representation learning in deep generative models. It explains that generative models can generate realistic images but it is difficult to control specific attributes of the generated images. Recent research aims to learn disentangled representations where each latent variable corresponds to an independent perceptual factor, such as object pose or color. Methods described include InfoGAN, β-VAE, spatial conditional batch normalization, hierarchical latent variables, and StyleGAN's hierarchical modulation approach. Measuring entanglement through perceptual path length and linear separability is also discussed. The document suggests disentangled representation learning could help applications in biology and medicine by providing better explanatory variables for complex phenomena.
[DL輪読会]Large Scale GAN Training for High Fidelity Natural Image SynthesisDeep Learning JP
This document summarizes research on improving the fidelity of natural image synthesis using generative adversarial networks (GANs). The key points discussed include:
1. Using hierarchical latent spaces and shared embeddings to allow controlling image synthesis at different levels of detail.
2. Applying a "truncation trick" during sampling to generate higher quality images by truncating the latent distribution.
3. Adding orthogonal regularization to the generator to improve training stability and image quality.
Mask R-CNN extends Faster R-CNN by adding a branch for predicting segmentation masks in parallel with bounding box recognition and classification. It introduces a new layer called RoIAlign to address misalignment issues in the RoIPool layer of Faster R-CNN. RoIAlign improves mask accuracy by 10-50% by removing quantization and properly aligning extracted features. Mask R-CNN runs at 5fps with only a small overhead compared to Faster R-CNN.
This document summarizes recent developments in action recognition using deep learning techniques. It discusses early approaches using improved dense trajectories and two-stream convolutional neural networks. It then focuses on advances using 3D convolutional networks, enabled by large video datasets like Kinetics. State-of-the-art results are achieved using inflated 3D convolutional networks and temporal aggregation methods like temporal linear encoding. The document provides an overview of popular datasets and challenges and concludes with tips on training models at scale.
Disentangled Representation Learning of Deep Generative ModelsRyohei Suzuki
This document discusses disentangled representation learning in deep generative models. It explains that generative models can generate realistic images but it is difficult to control specific attributes of the generated images. Recent research aims to learn disentangled representations where each latent variable corresponds to an independent perceptual factor, such as object pose or color. Methods described include InfoGAN, β-VAE, spatial conditional batch normalization, hierarchical latent variables, and StyleGAN's hierarchical modulation approach. Measuring entanglement through perceptual path length and linear separability is also discussed. The document suggests disentangled representation learning could help applications in biology and medicine by providing better explanatory variables for complex phenomena.
Soft Actor-Critic is an off-policy maximum entropy deep reinforcement learning algorithm that uses a stochastic actor. It was presented in a 2017 NIPS paper by researchers from OpenAI, UC Berkeley, and DeepMind. Soft Actor-Critic extends the actor-critic framework by incorporating an entropy term into the reward function to encourage exploration. This allows the agent to learn stochastic policies that can operate effectively in environments with complex, sparse rewards. The algorithm was shown to learn robust policies on continuous control tasks using deep neural networks to approximate the policy and action-value functions.