Multi-Domain Image Completion
for Random Missing Input Data
Stanford University, NVIDIA, and National Institutes of Health
Yonsei University Severance Hospital CCIDS
Choi Dongmin
Introduction
• Multi-domain images could provide complementary knowledge

- ex. Four MRI modalities (T1, T1CE, T2, FLAIR) provide distinct features to locate tumor
boundaries from different diagnosis perspective

- ex. Person re-identification across different cameras or times

• However, some image domains might be missing in practice

- Solution 1. Nearest neighbor approach : lack of semantic consistency

- Solution 2. Generative models

• ReMIC (Representational disentanglement schemes for Multi-domain
Image Completion)

- -to- image completion framework 

- utilized for the high-level task by joint training (ex. segmentation)

- completes the missing domains given random distributed numbers of visible domains

- consistent performance improvement on three datasets
n n
Related Works
Image-to-Image

Translation
J.Y Zhu et al. Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks. ICCV 2017
- Impressive performance via cycle-consistency loss

- only 1-to-1 mapping
• CycleGAN
Related Works
Image-to-Image

Translation
Y Choi et al. StarGAN: Unified Generative Adversarial Networks for Multi-Domain Image-to-Image Translation. CVPR 2018

J Yoon et al. RadialGAN: Leveraging multiple datasets to improve target-specific predictive models using GANs. ICML 2018
- Multi-domain image generation

- only 1-to- mapping (generation is always conditioned on the single input
image as the only source domain)
n
• StarGAN & RadialGAN
StarGAN RadialGAN
Related Works
Image-to-Image

Translation
D Lee et al. CollaGAN: Collaborative GAN for Missing Image Data Imputation. CVPR 2019
- Collaborative model to incorporate multiple domains for generating one
missing domain

- only -to-1 mappingn
• CollaGAN
Related Works
Image-to-Image

Translation
Red boxes are missing-domain images
Only ReMIC is -to- mappingn n
Related Works
Learning

Disentangled

Representations
• Learning Disentangled Representations

- to capture the full distribution of possible outputs by introducing a random style code

- to transfer information across domains for adaptation

- InfoGAN and -VAE learn the disentangled representation in unsupervised mannerβ
Xi Chen et al. InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets. NIPS 2016

I Higgins et al. beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework. ICLR 2017
https://www.slideshare.net/NaverEngineering/ss-96581209
Related Works
Learning

Disentangled

Representations
• DRIT and MUNIT

- disentangles content and attribute features in image translation

- However, only 1-to-1 translation
H.Y Lee et al. Diverse image-to- image translation via disentangled representations. ECCV 2018

X Huang et al. Multimodal Unsupervised Image-to-Image Translation. ECCV 2018
Related Works
Learning

Disentangled

Representations
• Liu et al

- tackles multi-domain learning cross-domain latent code

- less discussion about the domain-specific style code
Liu et al. A Unified Feature Disentangler for Multi-Domain Image Translation and Manipulation. NIPS 2018
Related Works Medical

Image Synthesis
• Previous works also discuss how to extract representations from multi-
modalities especially for segmentation with missing modalities

- However, fuse the features from multiple modalities but not from the perspective of
representation disentanglement
V Nguyen et al. Cross-domain synthesis of medical images using efficient location-sensitive deep network. MICCAI 2015

M Havaei et al. HeMIS: Hetero-Modal Image Segmentation. MICCAI 2016

A Chartsias et al.Multimodal mr synthesis via modality-invariant latent representation. IEEE transactions on medical imaging 2017
Method
Red boxes are

missing-domain images
Method
- Image decomposition

: Shared content structure (skeleton) + Unique characteristics (flesh)

- Missing image reconstruction during the testing

: Shared skeleton from available domains + Sampled flesh from the learned model
Style code (Domain-specific)

- Style encoder Es
i (xi) = si (1 ≤ i ≤ N)
Content code (Shared)

- Content encoder Ec
(x1, x2, …, xN) = c
Method
- Content codes visualization (randomly selected 8 out of 256 channels) of BraTs

: Various focuses on different anatomical structures (ex. tumor, brain, skull) are
demonstrated by different channel-wise feature maps
Input images
Method
- Generation : Style codes from a prior distribution + Content Code 

-
si c
Gi(c, si) = ˜xi
Image Generation Process
Method
Segmentation Branch
- Segmentation generator after content codes

- Assumption : The content codes contain essential image structure information

- Joint training (generation loss + segmentation Dice loss)

: adaptively learn how to generate missing images
GS
Method
Training Loss
Image Consistency Loss
Latent Consistency Loss
Adversarial Loss
Reconstruction Loss
Segmentation Loss
Method
Total Loss
Training Loss
Adversarial

(λadv = 1)
Image

Consistency

(λx
cyc = 10)
Style Latent

Consistency

(λs
cyc = 1)
Reconstruction

(λrec = 20)
Content Latent

Consistency

(λc
cyc = 1)
Segmentation

(λseg = 1)
Experiments
• BraTS 2018 dataset

- Multi-modal brain MRI with four modalities : T1, T1Gd, T2, FLAIR

- Following CollaGAN, 218 training and 28 testing samples randomly selected

- A set of 2D slices (40,148 training / 5,340 test) extracted from 3D volumes

- Resized to 256 256

- Three tumor categories

: Enhancing tumor (ET), tumor core (TC), and whole tumor (WT)
×
D Lee et al. CollaGAN: Collaborative GAN for Missing Image Data Imputation. CVPR 2019

B.H Menze et al. The Multimodal Brain Tumor Image Segmentation Benchmark (BRATS). IEEE TMI
Experiments
• ProstateX dataset

- Multi-parametric prostate MR scans for 98 subjects : T2, ADC, HighB

- 78 training and 20 testing samples randomly selected

- A set of 2D slices (3,540 training / 840 test) extracted from 3D volumes

- Resized to 256 256

- Prostate regions are manually labeled as the whole prostate (WP)
×
G Litjens et al. Computer-aided detection of prostate cancer in MRI. IEEE TMI
Experiments
• RaFD (Radboud Faces Database)

- Eight facial expressions

: neutral, angry, contemptuous, disgusted, fearful, happy, sand, and surprised

- Following StarGAN, adopt image from three camera angles with three gaze
directions

- 3,888 training (54 participants) / 936 test (13 participants)

- Cropped with the face in the center and Resized to 128 128×
Y Choi et al. StarGAN: Unified Generative Adversarial Networks for Multi-Domain Image-to-Image Translation. CVPR 2018
http://www.socsci.ru.nl:8180/RaFD2/RaFD
Results
• Multi-Domain Image Completion on Domains

- Only One Missing Domain ( -to-1)

* Training : the one missing domain is randomly distributed

* Testing : Fix the one missing domain and generate outputs only on that 



- More than One Missing Domains ( -to- )

* Training : randomly selected visible domains

* Testing : Fix while these domains are randomly selected visible 

domains. Evaluate all the generated images



- Evaluation metrics

* NRMSE (Normalized Root Mean Squared Error)

* SSIM (Structural Similarity) 

* PSNR (Peak Signal-to-Noise Ratio)
N
n
n n
k (k ∈ {1,…, N − 1})
k k
N
Results
• Multi-Domain Image Completion

- Comparison with MUNIT, StarGAN, and CollaGAN

- ReMIC w/o Recon : ReMIC without reconstruction loss (single missing domain)

- ReMIC-Random : random visible domains (multiple missing domains)(k = * ) k
Results
• Multi-Domain Image Completion

- Comparison with MUNIT, StarGAN, and CollaGAN

- ReMIC w/o Recon : ReMIC without reconstruction loss (single missing domain)

- ReMIC-Random : random visible domains (multiple missing domains)(k = * ) k
Results
• Multi-Domain Image Completion
Results
• Multi-Domain Image Completion
Results
• Multi-Domain Image Completion
Results
• Multi-Domain Image Completion
Results
• Multi-Domain Segmentation

- Oracle : fully supervised 2D U-Net variation without missing images

- Oracle+* : the missing images generated from the “*” method

with the pre-trained “Oracle” model (All : without any missing domains)

- ReMIC+Seg : separate content encoders for image generation and 

segmentation tasks

- ReMIC+Joint : sharing the weights of content encoder for the two tasks
Conclusion
• A general framework for multi-domain image completion, given
that one or more input domains are missing

• Learning shared content and domain-specific style encoding
across multiple domains

• Well generalized to both natural and medical images

• Extended for a unified image generation and segmentation
framework for missing-domain segmentation task
Question
• According to this paper, “different modalities provide distinct
features to locate tumor boundaries from differential diagnosis
perspectives”.

But ReMIC uses a content code, which encodes the shared
skeleton, as an input for the connected segmentation generator.
Isn’t is a contradiction?
ICLR 2020 Reviews
• The main contribution is representational disentanglement,
namely the content and style separation, but there is no explicit
evidence that this separation is really happened 

• Evaluation on high-resolution dataset such as CelebHQ and other
conventional metrics such as FID
https://openreview.net/forum?id=rkg_wREYDS
Appendix. A : Implementation Details
• A.1 Hyperparameters

- Adam optimizer 

- Batch size 1 and 100,000 iterations

- Style code dimension : 8

- During testing, a fixed style code of 0.5 in each dimension

• A.2 Network Architectures (Check details in paper)

- ReMIC is developed on the backbone of MUNIT

- Unified Content Encoder : Down-sampling module + Residual Blocks (IN)

- Style Encoder : Down-sampling module + Residual Blocks + GAP + FC

- Generator : Four residual blocks + Up-sampling + AdaIN*

- Discriminator : Four convolutional blocks

- Segmentor : U-Net shaped network
(β1 = 0.5, β2 = 0.999)
X Huang et al. Arbitrary style transfer in real-time with adaptive instance normalization. ICCV 2017
Appendix. C : Extended Ablation Study and
Results for Multi-domain Segmentation
• C.4 Analysis of missing-domain segmentation results
Thank you

Review : Multi-Domain Image Completion for Random Missing Input Data [cdm]

  • 1.
    Multi-Domain Image Completion forRandom Missing Input Data Stanford University, NVIDIA, and National Institutes of Health Yonsei University Severance Hospital CCIDS Choi Dongmin
  • 2.
    Introduction • Multi-domain imagescould provide complementary knowledge
 - ex. Four MRI modalities (T1, T1CE, T2, FLAIR) provide distinct features to locate tumor boundaries from different diagnosis perspective
 - ex. Person re-identification across different cameras or times • However, some image domains might be missing in practice
 - Solution 1. Nearest neighbor approach : lack of semantic consistency
 - Solution 2. Generative models • ReMIC (Representational disentanglement schemes for Multi-domain Image Completion)
 - -to- image completion framework 
 - utilized for the high-level task by joint training (ex. segmentation)
 - completes the missing domains given random distributed numbers of visible domains
 - consistent performance improvement on three datasets n n
  • 3.
    Related Works Image-to-Image
 Translation J.Y Zhuet al. Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks. ICCV 2017 - Impressive performance via cycle-consistency loss - only 1-to-1 mapping • CycleGAN
  • 4.
    Related Works Image-to-Image
 Translation Y Choiet al. StarGAN: Unified Generative Adversarial Networks for Multi-Domain Image-to-Image Translation. CVPR 2018 J Yoon et al. RadialGAN: Leveraging multiple datasets to improve target-specific predictive models using GANs. ICML 2018 - Multi-domain image generation - only 1-to- mapping (generation is always conditioned on the single input image as the only source domain) n • StarGAN & RadialGAN StarGAN RadialGAN
  • 5.
    Related Works Image-to-Image
 Translation D Leeet al. CollaGAN: Collaborative GAN for Missing Image Data Imputation. CVPR 2019 - Collaborative model to incorporate multiple domains for generating one missing domain - only -to-1 mappingn • CollaGAN
  • 6.
    Related Works Image-to-Image
 Translation Red boxesare missing-domain images Only ReMIC is -to- mappingn n
  • 7.
    Related Works Learning
 Disentangled
 Representations • LearningDisentangled Representations
 - to capture the full distribution of possible outputs by introducing a random style code
 - to transfer information across domains for adaptation
 - InfoGAN and -VAE learn the disentangled representation in unsupervised mannerβ Xi Chen et al. InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets. NIPS 2016 I Higgins et al. beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework. ICLR 2017 https://www.slideshare.net/NaverEngineering/ss-96581209
  • 8.
    Related Works Learning
 Disentangled
 Representations • DRITand MUNIT
 - disentangles content and attribute features in image translation
 - However, only 1-to-1 translation H.Y Lee et al. Diverse image-to- image translation via disentangled representations. ECCV 2018 X Huang et al. Multimodal Unsupervised Image-to-Image Translation. ECCV 2018
  • 9.
    Related Works Learning
 Disentangled
 Representations • Liuet al
 - tackles multi-domain learning cross-domain latent code
 - less discussion about the domain-specific style code Liu et al. A Unified Feature Disentangler for Multi-Domain Image Translation and Manipulation. NIPS 2018
  • 10.
    Related Works Medical
 ImageSynthesis • Previous works also discuss how to extract representations from multi- modalities especially for segmentation with missing modalities
 - However, fuse the features from multiple modalities but not from the perspective of representation disentanglement V Nguyen et al. Cross-domain synthesis of medical images using efficient location-sensitive deep network. MICCAI 2015 M Havaei et al. HeMIS: Hetero-Modal Image Segmentation. MICCAI 2016
 A Chartsias et al.Multimodal mr synthesis via modality-invariant latent representation. IEEE transactions on medical imaging 2017
  • 11.
  • 12.
    Method - Image decomposition
 :Shared content structure (skeleton) + Unique characteristics (flesh) - Missing image reconstruction during the testing
 : Shared skeleton from available domains + Sampled flesh from the learned model Style code (Domain-specific)
 - Style encoder Es i (xi) = si (1 ≤ i ≤ N) Content code (Shared)
 - Content encoder Ec (x1, x2, …, xN) = c
  • 13.
    Method - Content codesvisualization (randomly selected 8 out of 256 channels) of BraTs
 : Various focuses on different anatomical structures (ex. tumor, brain, skull) are demonstrated by different channel-wise feature maps Input images
  • 14.
    Method - Generation :Style codes from a prior distribution + Content Code - si c Gi(c, si) = ˜xi Image Generation Process
  • 15.
    Method Segmentation Branch - Segmentationgenerator after content codes - Assumption : The content codes contain essential image structure information - Joint training (generation loss + segmentation Dice loss)
 : adaptively learn how to generate missing images GS
  • 16.
    Method Training Loss Image ConsistencyLoss Latent Consistency Loss Adversarial Loss Reconstruction Loss Segmentation Loss
  • 17.
    Method Total Loss Training Loss Adversarial
 (λadv= 1) Image
 Consistency
 (λx cyc = 10) Style Latent
 Consistency
 (λs cyc = 1) Reconstruction
 (λrec = 20) Content Latent
 Consistency
 (λc cyc = 1) Segmentation
 (λseg = 1)
  • 18.
    Experiments • BraTS 2018dataset
 - Multi-modal brain MRI with four modalities : T1, T1Gd, T2, FLAIR
 - Following CollaGAN, 218 training and 28 testing samples randomly selected
 - A set of 2D slices (40,148 training / 5,340 test) extracted from 3D volumes
 - Resized to 256 256
 - Three tumor categories
 : Enhancing tumor (ET), tumor core (TC), and whole tumor (WT) × D Lee et al. CollaGAN: Collaborative GAN for Missing Image Data Imputation. CVPR 2019 B.H Menze et al. The Multimodal Brain Tumor Image Segmentation Benchmark (BRATS). IEEE TMI
  • 19.
    Experiments • ProstateX dataset
 -Multi-parametric prostate MR scans for 98 subjects : T2, ADC, HighB
 - 78 training and 20 testing samples randomly selected
 - A set of 2D slices (3,540 training / 840 test) extracted from 3D volumes
 - Resized to 256 256
 - Prostate regions are manually labeled as the whole prostate (WP) × G Litjens et al. Computer-aided detection of prostate cancer in MRI. IEEE TMI
  • 20.
    Experiments • RaFD (RadboudFaces Database)
 - Eight facial expressions
 : neutral, angry, contemptuous, disgusted, fearful, happy, sand, and surprised
 - Following StarGAN, adopt image from three camera angles with three gaze directions
 - 3,888 training (54 participants) / 936 test (13 participants)
 - Cropped with the face in the center and Resized to 128 128× Y Choi et al. StarGAN: Unified Generative Adversarial Networks for Multi-Domain Image-to-Image Translation. CVPR 2018 http://www.socsci.ru.nl:8180/RaFD2/RaFD
  • 21.
    Results • Multi-Domain ImageCompletion on Domains
 - Only One Missing Domain ( -to-1)
 * Training : the one missing domain is randomly distributed
 * Testing : Fix the one missing domain and generate outputs only on that 
 
 - More than One Missing Domains ( -to- )
 * Training : randomly selected visible domains
 * Testing : Fix while these domains are randomly selected visible 
 domains. Evaluate all the generated images
 
 - Evaluation metrics
 * NRMSE (Normalized Root Mean Squared Error)
 * SSIM (Structural Similarity) 
 * PSNR (Peak Signal-to-Noise Ratio) N n n n k (k ∈ {1,…, N − 1}) k k N
  • 22.
    Results • Multi-Domain ImageCompletion
 - Comparison with MUNIT, StarGAN, and CollaGAN
 - ReMIC w/o Recon : ReMIC without reconstruction loss (single missing domain)
 - ReMIC-Random : random visible domains (multiple missing domains)(k = * ) k
  • 23.
    Results • Multi-Domain ImageCompletion
 - Comparison with MUNIT, StarGAN, and CollaGAN
 - ReMIC w/o Recon : ReMIC without reconstruction loss (single missing domain)
 - ReMIC-Random : random visible domains (multiple missing domains)(k = * ) k
  • 24.
  • 25.
  • 26.
  • 27.
  • 28.
    Results • Multi-Domain Segmentation
 -Oracle : fully supervised 2D U-Net variation without missing images
 - Oracle+* : the missing images generated from the “*” method
 with the pre-trained “Oracle” model (All : without any missing domains)
 - ReMIC+Seg : separate content encoders for image generation and 
 segmentation tasks
 - ReMIC+Joint : sharing the weights of content encoder for the two tasks
  • 29.
    Conclusion • A generalframework for multi-domain image completion, given that one or more input domains are missing • Learning shared content and domain-specific style encoding across multiple domains • Well generalized to both natural and medical images • Extended for a unified image generation and segmentation framework for missing-domain segmentation task
  • 30.
    Question • According tothis paper, “different modalities provide distinct features to locate tumor boundaries from differential diagnosis perspectives”.
 But ReMIC uses a content code, which encodes the shared skeleton, as an input for the connected segmentation generator. Isn’t is a contradiction?
  • 31.
    ICLR 2020 Reviews •The main contribution is representational disentanglement, namely the content and style separation, but there is no explicit evidence that this separation is really happened • Evaluation on high-resolution dataset such as CelebHQ and other conventional metrics such as FID https://openreview.net/forum?id=rkg_wREYDS
  • 32.
    Appendix. A :Implementation Details • A.1 Hyperparameters
 - Adam optimizer 
 - Batch size 1 and 100,000 iterations
 - Style code dimension : 8
 - During testing, a fixed style code of 0.5 in each dimension • A.2 Network Architectures (Check details in paper)
 - ReMIC is developed on the backbone of MUNIT
 - Unified Content Encoder : Down-sampling module + Residual Blocks (IN)
 - Style Encoder : Down-sampling module + Residual Blocks + GAP + FC
 - Generator : Four residual blocks + Up-sampling + AdaIN*
 - Discriminator : Four convolutional blocks
 - Segmentor : U-Net shaped network (β1 = 0.5, β2 = 0.999) X Huang et al. Arbitrary style transfer in real-time with adaptive instance normalization. ICCV 2017
  • 33.
    Appendix. C :Extended Ablation Study and Results for Multi-domain Segmentation • C.4 Analysis of missing-domain segmentation results
  • 34.