[Paper Review] MisGAN: Learning from Incomplete Data with Generative Adversarial Networks (ICLR'19)
1. MisGAN
Learning from Incomplete Data with Generative Adversarial Networks
Steven Cheng-Xian Li
University of Massachusetts Amherst
Jihoo Kim
datartist@hanyang.ac.kr
Dept. of Computer and Software, Hanyang University
ICLR’19
2. Abstract
GANs provides an effective way to model complex distributions.
But, typical GANs require full-observed data during training.
In this paper, we present a GAN-based framework for learning from complex, high-
dimensional incomplete data
The proposed framework learns a complete data generator
along with a mask generator that models the missing data distribution.
We evaluate the proposed framework under the MCAR assumption.
3. 1. Introduction
Unlike likelihood-based methods, GANs is an implicit probabilistic models
which represent a probability distribution through a generator
that learns to directly produce samples from the desired distribution.
GANs have been shown to be very successful in a range of applications
- Generating photorealistic images (2018)
- Image inpainting (2016, 2017)
Training GANs normally requires access to a large collection of fully-observed data.
However, it is not always possible to obtain a large amount of full-observed data.
4. 1. Introduction
The generative process for incompletely observed data (2014, Little & Rubin)
the observed elements of x
the missing according to the mask m
the unknown parameters
of the mask distribution
the unknown parameters
of the data distribution
a binary mask that determines
which entries in x to reveal
a complete data vector
5. 1. Introduction
The unknown parameters are estimated by maximizing the following marginal likelihood.
Little & Rubin (2014) characterize the missing data mechanism
in terms of independence between the complete data x, and the masks m.
①
②
③
6. 1. Introduction
Most work on incomplete data assumes MCAR or MAR since under these assumptions
can be factorized into .
→ The missing data mechanism can be ignored when learning the data generating model
while yielding correct estimates for θ.
When does not admit efficient marginalization over , estimation of θ is usually
performed by maximizing a variational lower bound
7. 1. Introduction
The primary contribution of this paper is the development of a GAN-based framework for
learning high-dimensional data distributions in the presence of incomplete observations.
Our framework introduces an auxiliary GAN for learning a mask distribution to model
the missingness.
The masks are used to “mask” generated complete data by filling the indicated missing
entries with a constant value.
The complete data generator is trained so that the resulting masked data are
indistinguishable from real incomplete data that are masked similarly.
8. 1. Introduction
Our framework builds on the ideas of AmbientGAN (2018).
AmbientGAN modifies the discriminator of a GAN to distinguish corrupted real samples
from corrupted generated samples under a range of corruption processes.
Missing data can be seen as a special type of corruption.
AmbientGAN assumes the measurement process is known only by a few parameters,
which is not the case in general missing data problems.
9. We provide empirical evidence that the proposed framework is able to effectively learn
complex, high-dimensional data distributions from highly incomplete data.
We further show how the architecture can be used to generate high-quality imputations.
1. Introduction
10. 1 , is observed.
2. MisGAN: A GAN for Missing Data
incomplete data
a partially-observed data vector
a corresponding mask
0 , is missing and contain arbitrary value that we should ignore.
It leads to a cleaner description of the proposed MisGAN.
It suggests how MisGAN can be implemented efficiently.
Instead of …
11. Two key ideas…
1. We explicitly model the missing data process using a mask generator.
Since the masks in the incomplete dataset are fully observed, we can estimate their distribution.
2. We train the complete data generator adversarially by masking its outputs using generated
masks and , and comparing to real incomplete data that are similarly masked by .
2. MisGAN: A GAN for Missing Data
Masking operator that fills in missing entries with a constant value .
12. 2. MisGAN: A GAN for Missing Data
We use two generator-discriminator pairs
We focus on MCAR, where the two generators are independent of each other
and have their own noise distributions
Loss function for the masks
Loss function for the data
Fake MaskReal Mask
Fake DataReal Data
13. 2. MisGAN: A GAN for Missing Data
We optimize the generators and the discriminators according to the following objectives
Loss function for the masks
Loss function for the data
The losses above follow the Wasserstein GAN formulation (Arjovsky, 2017)
coefficient
We find that choosing a small value
such as 𝜶 = 𝟎. 𝟐 improves performance
15. Wasserstein GAN (Arjovsky, 2017) Facebook AI Research
Wasserstein GAN (WGAN) proposes a new cost function
using Wasserstein distance that has a smoother gradient everywhere.
Arjovsky et al 2017 wrote a paper to illustrate the GAN problem mathematically.
17. 2. MisGAN: A GAN for Missing Data
The data discriminator takes as input the masked samples as if the data are fully-observed.
This allows us to use any existing architecture designed for complete data.
The masks are binary. Discrete data generating processes have zero gradient almost everywhere.
To carry out gradient-based training for GANs, we relax the output of the mask generator .
The discriminator in MisGAN is unaware of which entries are missing in the masked input samples,
and does not even need to know which value is used for masking. (In next section, theoretical analysis)
Note that…
19. 3. Theoretical Results
Two important questions
Does the choice of the filled-in value
affect the ability to recover the data distribution?
Does information about the location of missing values
affect the ability to recover the data distribution?
Q1.
Q2.
24. 4. Missing Data Imputation
We show how to impute missing data according to
by equipping MisGAN with an imputer accompanied by a corresponding discriminator .
Loss function for the masks
Loss function for the data
Loss function for the imputer
noise distribution
𝜶 = 𝟎. 𝟐
𝜷 = 𝟎. 𝟏
This encourages the generated masks
to match the distribution of the real masks
and the masked generated complete samples
to match masked real data.
This encourages the generated complete data
to match the distribution of the imputed real data
In addition to having the masked generated data
match the masked real data.
25. 4. Missing Data Imputation
We can also train a stand-alone imputer using only
with a pre-trained data generator .
Moreover, it is also possible to train the imputer to target a different missing distribution
with a pre-trained data generator alone without access to the original (incomplete) training data
27. 5. Experiments
Data
Missing data
distributions
Evaluation
metric
MNIST
CIFAR-10
CelebA
28x28 handwritten digits images
32x32 color images from 10 classes
64x64 face images (202,599)
The range of pixel values
is rescaled to
Square
observation
Dropout
Variable-size
rectangular
observation
All pixels are missing except for a square
occurring at a random location on the image
Each pixel is independently missing
according to a Bernoulli distribution
All pixels are missing except for a rectangular observed region
(width and height are drawn from 25% to 75% o the image length)
(Heusel, 2017)
28. 5. Experiments
1. Architectures
2. Baseline
3. Results
MisGAN with convolutional networks – DCGAN (Radford, 2015)
MisGAN with fully connected networksFC-MisGAN
Conv-MisGAN
ConvAC
The generative convolutional arithmetic circuit (Sharir, 2016)
→ capable of learning from large-scale incomplete data
Figure 3
Figure 4
Figure 5
Figure 6
5.1 Empirical Study of MisGAN on MNIST
Next slides...
30. 5. Experiments 5.1 Empirical Study of MisGAN on MNIST
MisGAN outperforms ConvAC
Data samples generated by Conv-MisGAN
Mask samples generated by Conv-MisGAN
Data samples generated by MisGAN
Variable-size
Square
31. 5. Experiments
4. Ablation study
5.1 Empirical Study of MisGAN on MNIST
We point out that the mask discriminator in MisGAN is important for learning the correct distribution.
Two failure cases of AmbientGAN, which is essentially equivalent to a MisGAN without the mask discriminator.
Generated data samplesGenerated mask samples Generated data samples Generated mask samples
rescale
32. 5. Experiments 5.1 Empirical Study of MisGAN on MNIST
5. Missing data imputation
Inside of box → observed pixels
Outside of box → generated pixels Each row → same incomplete input
The imputer can produce a variety of different imputed results
33. 5. Experiments 5.2 Quantitative Evaluation
1. Baselines
3. Architecture
2. Evaluation of
imputation
4. Results
We focus on evaluating MisGAN on the missing data imputation task
zero/mean imputation
matrix factorization
GAIN (Generative Adversarial Imputation Network)
FID between the imputed data and the original fully-observed data
For MNIST → Fully-connected imputer network
For CIFAR-10 and CelebA → Five-layer U-Net architecture (Ronneberger, 2015)
Next slides...
34. 5. Experiments 5.2 Quantitative Evaluation
MisGAN consistently outperforms other methods in all cases, especially under high missing rates.
Training MisGAN is more stable than training GAIN.
35. 6. Discussion and Future Work
This work presents and evaluates a high flexible framework for learning
standard GAN data generators in the presence of missing data.
We only focus on the MCAR case in this work.
MisGAN can be easily extended to cases both MAR and NMAR.
We have tried the modified architecture and it showed similar results.
This suggests that the extra dependencies may not adversely affect learnability.
We leave the formal evaluation of this modified framework for future work.