Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

3D Volumetric Data Generation with Generative Adversarial Networks


Published on

Hiroyuki Vincent Yamazaki, PFN Summer Internship 2016

Published in: Technology
  • Be the first to comment

3D Volumetric Data Generation with Generative Adversarial Networks

  1. 1. 3D Volumetric Data Generation with Generative Adversarial Networks Hiroyuki Vincent Yamazaki Keio University Preferred Networks Summer Internship, 2016
  2. 2. Background Generative Adversarial Networks (GAN) [1] have achieved state-of-the-art performance in unsupervised learning, generating synthetic images by training on the MNIST dataset or ImageNet for multi-channel images. However, these networks have not yet been extended to higher dimensions such as volumetric 3D data. Generated 3D model have various applications in entertainment and could be used as an alternative to existing procedural methods for creating graphics. This study demonstrates the capabilities of GAN-based architectures for generating practical 3D models by applying 3 dimensional convolutions and deconvolutions* on voxel data. Goal • Extension of GANs to 3D volumetric data, training on a single class • Control the shapes of the generated models by e.g. interpolation 1. Introduction *Transposed Convolutions
  3. 3. 2. Training Data 3D CAD models from ShapeNet [2] • Class: Chair • Instances: 4846 Preprocessing • Voxelization • 3D CAD models are converted into binary 0, 1 voxels with dimensions (32, 32, 32). [3] • Normalization • No normalization is applied. Data is in range [0, 1] • Other • Remove bad samples and centre the models in the 
 space Training Data Volume DistributionMean 3D Model
  4. 4. A GAN consists of a generator G and a discriminator D, in this case, both of them are represented as a feed forward neural network that are trained simultaneously. • Random noise z vectors sampled from a uniform or Gaussian distribution Loss • Softmax cross-entropies based on the predictions of D • Separate losses for G and D defined by the minimax game Optimal Discriminator Strategy Optimization • Adam for both G and D • Learning rate of G is larger than D 3. Generative Adversarial Network Random Noise Random Index Generator (Linear, Deconvolution, Batch Normalization, ReLU, Sigmoid) Discriminator (Convolution, Linear, Leaky ReLU) Training Data Generated 3D Model Real 3D Model Generated/Real Prediction See Appendix for the network architecture and Adam parameters min G max D V (G, D) = Ex⇠Pdata(x)[log D(x)] + Ez⇠Pz(z)[log(1 D(G(z)))] D(x) = pdata(x) pdata(x) + pG(x)
  5. 5. Issues with GAN • Collapsing Generator • G outputs similar 3D models for different inputs • Non-semantic input z • Interpolation of z indicate on sharp edges in the latent space. Hence no way to control the shape of the output
 Improving the GAN • Avoid Generator from collapsing • Minibatch Discrimination [4] layer in D • Embed semantic meaning into the input [5] • With z, concatenate additional latent codes before feeding it to G • Additional loss based on mutual information reconstruction by D Random Noise + Latent Codes Random Index Generator (Linear, Deconvolution, Batch Normalization, ReLU, Sigmoid) Discriminator (Convolution, Linear, Leaky ReLU, Minibatch Discrimination) Training Data Generated 3D Model Real 3D Model Generated/Real Prediction Mutual Information Reconstruction
  6. 6. Minibatch Discrimination Motivation Avoid generator from collapsing to a single point Idea Reproduce the diversity in the training data Minibatch Discrimination layer to D, before the generated/real prediction For each minibatch fed to this layer, compute the L1 distance between all input vectors Add this information to the given minibatch Mutual Information Reconstruction Motivation Embed semantic meanings in z Idea Maximize the mutual information being preserved for latent codes C that are passed through the networks Latent Codes, input to G • C = [C1, C2, C3] (Concatenations) • Categorical one-hot vector C1~Cat(K=2, p=0.5) • Continuous C2~Unif(-1, 1) • Continuous C3~Unif(-1, 1) Reconstruction, output from D • Categorical • Softmax Cross Entropy • Continuous • Assume a fixed variance and compute the Gaussian negative log-likelihood based on the mean. z c1, e.g. [0, 1] c2 c3 Softmax1 𝞵2 𝞵3 Minibatch Discrimination Layer Kernel … …
  7. 7. • Minibatch size: 128 • Epochs: 100 4. Results Generated 3D Models *The blue models are their nearest models in the training dataset 3D Volume Distributions Chair-likeness Learned Distribution True DistributionLosses
  8. 8. 5. Conclusions • GANs can be extended to 3D volumetric data using 3 dimensional convolutions and deconvolutions • Smaller datasets (sparse data) leads to worse looking models with noise • Partially mitigated by reconstructing mutual information reconstruction and minibatch discrimination • In many cases, D improves faster than G • Gradients back propagated through G saturates and training stops • Training not converging Future Work • Larger dataset with potentially multiple classes • Balance training between G and D • Heuristic • Stop updating D while it is too strong • Larger G, i.e. more parameters
  9. 9. Reference [1] Goodfellow et al. (2014). Generative Adversarial Networks. abs/1406.2661, . [2] Angel X. Chang and (2015). ShapeNet: An Information-Rich 3D Model Repository. CoRR, abs/1512.03012, . [3] Patrick Min, Binvox, 3D Mesh Voxelizer, [4] Tim Salimans et al. (2016). Improved Techniques for Training GANs. CoRR, abs/1606.03498, . [5] Xi Chen et al. (2016). InfoGAN: Interpretable Representation Learning by Information Maximizing. CoRR, abs/1606.03657, . Appendix Generator Discriminator Input ∈ R128+2+2 Input 32x32x32 3D voxel data FC 1024, BN, ReLU Conv 1 → 64, Kernel 4, Stride 2, lReLU (leaky ReLU) FC 16384, BN, ReLU Conv 64 → 128, Kernel 4, Stride 2, BN, lReLU DC 256 → 128, Kernel 4, Stride 2, BN, ReLU Conv 128 → 256, Kernel 4, Stride 2, BN, lReLU DC 128 → 64, Kernel 4, Stride 2, BN, ReLU FC 1024, BN, lReLU Output DC 64 → 1, Kernel 4, Stride 2, BN, ReLU Minibatch Discrimination, Kernels 64, Kernel Dimension 16 Output FC 2 (Generated/Real prediction) FC 256, BN, lReLU Output FC 2+2 (Mutual Information Reconstruction) Adam Optimizer Parameters Generator Discriminator ɑ 0.001 0.00005 β1 0.5 0.5 β1 0.999 0.999 GAN Architecture