This document discusses using generative adversarial networks (GANs) to generate 3D volumetric data. Specifically, it aims to extend GANs to 3D voxel data by applying 3D convolutions and deconvolutions. To do so, it trains a GAN on chair models from a dataset, representing the 3D models as binary voxel grids. Techniques like minibatch discrimination and mutual information reconstruction are used to improve training stability and add semantic meaning. The results generated 3D chair-like models but training convergence was an issue due to the small dataset size.
1. 3D Volumetric Data Generation
with
Generative Adversarial Networks
Hiroyuki Vincent Yamazaki
Keio University hvy@keio.jp
Preferred Networks Summer Internship, 2016
2. Background
Generative Adversarial Networks (GAN) [1] have achieved state-of-the-art performance in unsupervised
learning, generating synthetic images by training on the MNIST dataset or ImageNet for multi-channel
images.
However, these networks have not yet been extended to higher dimensions such as volumetric 3D data.
Generated 3D model have various applications in entertainment and could be used as an alternative to
existing procedural methods for creating graphics.
This study demonstrates the capabilities of GAN-based architectures for generating practical 3D models
by applying 3 dimensional convolutions and deconvolutions* on voxel data.
Goal
• Extension of GANs to 3D volumetric data, training on a single class
• Control the shapes of the generated models by e.g. interpolation
1. Introduction
*Transposed Convolutions
3. 2. Training Data
3D CAD models from ShapeNet [2]
• Class: Chair
• Instances: 4846
Preprocessing
• Voxelization
• 3D CAD models are converted into binary 0, 1 voxels with dimensions (32, 32, 32). [3]
• Normalization
• No normalization is applied. Data is in range [0, 1]
• Other
• Remove bad samples and centre the models in the
space
Training Data Volume DistributionMean 3D Model
4. A GAN consists of a generator G and a discriminator D, in this
case, both of them are represented as a feed forward neural
network that are trained simultaneously.
• Random noise z vectors sampled from a uniform or
Gaussian distribution
Loss
• Softmax cross-entropies based on the predictions of D
• Separate losses for G and D defined by the minimax game
Optimal Discriminator Strategy
Optimization
• Adam for both G and D
• Learning rate of G is larger than D
3. Generative Adversarial Network
Random Noise
Random Index
Generator
(Linear, Deconvolution,
Batch Normalization,
ReLU, Sigmoid)
Discriminator
(Convolution, Linear, Leaky ReLU)
Training
Data
Generated
3D Model
Real
3D Model
Generated/Real
Prediction
See Appendix for the network architecture and Adam parameters
min
G
max
D
V (G, D) = Ex⇠Pdata(x)[log D(x)] + Ez⇠Pz(z)[log(1 D(G(z)))]
D(x) =
pdata(x)
pdata(x) + pG(x)
5. Issues with GAN
• Collapsing Generator
• G outputs similar 3D models for different inputs
• Non-semantic input z
• Interpolation of z indicate on sharp edges in the latent
space. Hence no way to control the shape of the output
Improving the GAN
• Avoid Generator from collapsing
• Minibatch Discrimination [4] layer in D
• Embed semantic meaning into the input [5]
• With z, concatenate additional latent codes before feeding
it to G
• Additional loss based on mutual information reconstruction
by D
Random Noise
+ Latent Codes
Random Index
Generator
(Linear, Deconvolution,
Batch Normalization,
ReLU, Sigmoid)
Discriminator
(Convolution, Linear, Leaky ReLU, Minibatch Discrimination)
Training
Data
Generated
3D Model
Real
3D Model
Generated/Real
Prediction
Mutual Information
Reconstruction
6. Minibatch Discrimination
Motivation
Avoid generator from collapsing to a single point
Idea
Reproduce the diversity in the training data
Minibatch Discrimination layer to D, before the
generated/real prediction
For each minibatch fed to this layer, compute the
L1 distance between all input vectors
Add this information to the given minibatch
Mutual Information Reconstruction
Motivation
Embed semantic meanings in z
Idea
Maximize the mutual information being preserved
for latent codes C that are passed through the
networks
Latent Codes, input to G
• C = [C1, C2, C3] (Concatenations)
• Categorical one-hot vector C1~Cat(K=2, p=0.5)
• Continuous C2~Unif(-1, 1)
• Continuous C3~Unif(-1, 1)
Reconstruction, output from D
• Categorical
• Softmax Cross Entropy
• Continuous
• Assume a fixed variance and compute the Gaussian
negative log-likelihood based on the mean.
z c1, e.g. [0, 1] c2 c3
Softmax1
𝞵2 𝞵3
Minibatch Discrimination Layer
Kernel
… …
7. • Minibatch size: 128
• Epochs: 100
4. Results
Generated 3D Models
*The blue models are their nearest models in the training dataset
3D Volume
Distributions
Chair-likeness Learned Distribution
True DistributionLosses
8. 5. Conclusions
• GANs can be extended to 3D volumetric data using 3 dimensional convolutions and deconvolutions
• Smaller datasets (sparse data) leads to worse looking models with noise
• Partially mitigated by reconstructing mutual information reconstruction and minibatch discrimination
• In many cases, D improves faster than G
• Gradients back propagated through G saturates and training stops
• Training not converging
Future Work
• Larger dataset with potentially multiple classes
• Balance training between G and D
• Heuristic
• Stop updating D while it is too strong
• Larger G, i.e. more parameters
9. Reference
[1] Goodfellow et al. (2014). Generative Adversarial Networks. abs/1406.2661, .
[2] Angel X. Chang and (2015). ShapeNet: An Information-Rich 3D Model Repository. CoRR, abs/1512.03012, .
[3] Patrick Min, Binvox, 3D Mesh Voxelizer, http://www.patrickmin.com/binvox/
[4] Tim Salimans et al. (2016). Improved Techniques for Training GANs. CoRR, abs/1606.03498, .
[5] Xi Chen et al. (2016). InfoGAN: Interpretable Representation Learning by Information Maximizing. CoRR, abs/1606.03657, .
Appendix
Generator Discriminator
Input ∈ R128+2+2 Input 32x32x32 3D voxel data
FC 1024, BN, ReLU Conv 1 → 64, Kernel 4, Stride 2, lReLU (leaky ReLU)
FC 16384, BN, ReLU Conv 64 → 128, Kernel 4, Stride 2, BN, lReLU
DC 256 → 128, Kernel 4, Stride 2, BN, ReLU Conv 128 → 256, Kernel 4, Stride 2, BN, lReLU
DC 128 → 64, Kernel 4, Stride 2, BN, ReLU FC 1024, BN, lReLU
Output DC 64 → 1, Kernel 4, Stride 2, BN, ReLU Minibatch Discrimination, Kernels 64, Kernel Dimension 16
Output FC 2 (Generated/Real prediction)
FC 256, BN, lReLU
Output FC 2+2 (Mutual Information Reconstruction)
Adam Optimizer Parameters
Generator Discriminator
ɑ 0.001 0.00005
β1 0.5 0.5
β1 0.999 0.999
GAN Architecture