AAAI2023「Are Transformers Effective for Time Series Forecasting?」と、HuggingFace「Yes, Transformers are Effective for Time Series Forecasting (+ Autoformer)」の紹介です。
AAAI2023「Are Transformers Effective for Time Series Forecasting?」と、HuggingFace「Yes, Transformers are Effective for Time Series Forecasting (+ Autoformer)」の紹介です。
The document discusses FactorVAE, a method for disentangling latent representations in variational autoencoders (VAEs). It introduces Total Correlation (TC) as a penalty term that encourages independence between latent variables. TC is added to the standard VAE objective function to guide the model to learn disentangled representations. The document provides details on how TC is defined and computed based on the density-ratio trick from generative adversarial networks. It also discusses how FactorVAE uses TC to learn disentangled representations and can be evaluated using a disentanglement metric.
This document discusses various methods for calculating Wasserstein distance between probability distributions, including:
- Sliced Wasserstein distance, which projects distributions onto lower-dimensional spaces to enable efficient 1D optimal transport calculations.
- Max-sliced Wasserstein distance, which focuses sampling on the most informative projection directions.
- Generalized sliced Wasserstein distance, which uses more flexible projection functions than simple slicing, like the Radon transform.
- Augmented sliced Wasserstein distance, which applies a learned transformation to distributions before projecting, allowing more expressive matching between distributions.
These sliced/generalized Wasserstein distances have been used as loss functions for generative models with promising
The document discusses FactorVAE, a method for disentangling latent representations in variational autoencoders (VAEs). It introduces Total Correlation (TC) as a penalty term that encourages independence between latent variables. TC is added to the standard VAE objective function to guide the model to learn disentangled representations. The document provides details on how TC is defined and computed based on the density-ratio trick from generative adversarial networks. It also discusses how FactorVAE uses TC to learn disentangled representations and can be evaluated using a disentanglement metric.
This document discusses various methods for calculating Wasserstein distance between probability distributions, including:
- Sliced Wasserstein distance, which projects distributions onto lower-dimensional spaces to enable efficient 1D optimal transport calculations.
- Max-sliced Wasserstein distance, which focuses sampling on the most informative projection directions.
- Generalized sliced Wasserstein distance, which uses more flexible projection functions than simple slicing, like the Radon transform.
- Augmented sliced Wasserstein distance, which applies a learned transformation to distributions before projecting, allowing more expressive matching between distributions.
These sliced/generalized Wasserstein distances have been used as loss functions for generative models with promising