【DL輪読会】NeRF-VAE: A Geometry Aware 3D Scene Generative ModelDeep Learning JP
NeRF-VAE is a 3D scene generative model that combines Neural Radiance Fields (NeRF) and Generative Query Networks (GQN) with a variational autoencoder (VAE). It uses a NeRF decoder to generate novel views conditioned on a latent code. An encoder extracts latent codes from input views. During training, it maximizes the evidence lower bound to learn the latent space of scenes and allow for novel view synthesis. NeRF-VAE aims to generate photorealistic novel views of scenes by leveraging NeRF's view synthesis abilities within a generative model framework.
【DL輪読会】NeRF-VAE: A Geometry Aware 3D Scene Generative ModelDeep Learning JP
NeRF-VAE is a 3D scene generative model that combines Neural Radiance Fields (NeRF) and Generative Query Networks (GQN) with a variational autoencoder (VAE). It uses a NeRF decoder to generate novel views conditioned on a latent code. An encoder extracts latent codes from input views. During training, it maximizes the evidence lower bound to learn the latent space of scenes and allow for novel view synthesis. NeRF-VAE aims to generate photorealistic novel views of scenes by leveraging NeRF's view synthesis abilities within a generative model framework.
論文紹介:Grad-CAM: Visual explanations from deep networks via gradient-based loca...Kazuki Adachi
Selvaraju, Ramprasaath R., et al. "Grad-cam: Visual explanations from deep networks via gradient-based localization." The IEEE International Conference on Computer Vision (ICCV), 2017, pp. 618-626
ConvMixer is a simple CNN-based model that achieves state-of-the-art results on ImageNet classification. It divides the input image into patches and embeds them into high-dimensional vectors, similar to ViT. However, unlike ViT, it does not use attention but instead applies simple convolutional layers between the patch embedding and classification layers. Experiments show that despite its simplicity, ConvMixer outperforms more complex models like ResNet, ViT, and MLP-Mixer on ImageNet, demonstrating that patch embeddings may be as important as attention mechanisms for vision tasks.
論文紹介:Grad-CAM: Visual explanations from deep networks via gradient-based loca...Kazuki Adachi
Selvaraju, Ramprasaath R., et al. "Grad-cam: Visual explanations from deep networks via gradient-based localization." The IEEE International Conference on Computer Vision (ICCV), 2017, pp. 618-626
ConvMixer is a simple CNN-based model that achieves state-of-the-art results on ImageNet classification. It divides the input image into patches and embeds them into high-dimensional vectors, similar to ViT. However, unlike ViT, it does not use attention but instead applies simple convolutional layers between the patch embedding and classification layers. Experiments show that despite its simplicity, ConvMixer outperforms more complex models like ResNet, ViT, and MLP-Mixer on ImageNet, demonstrating that patch embeddings may be as important as attention mechanisms for vision tasks.