Unsupervised learning aims to learn meaningful representations from unlabeled data which can captures its intrinsic structure, that can be transferred to downstream tasks. Meta-learning, whose objective is to learn to generalize across tasks such that the learned model can rapidly adapt to a novel task, shares the spirit of unsupervised learning in that the both seek to learn more effective and efficient learning procedure than learning from scratch. The fundamental difference of the two is that the most meta-learning approaches are supervised, assuming full access to the labels. However, acquiring labeled dataset for meta-training not only is costly as it requires human efforts in labeling but also limits its applications to pre-defined task distributions. In this paper, we propose a principled unsupervised meta-learning model, namely Meta-GMVAE, based on Variational Autoencoder (VAE) and set-level variational inference. Moreover, we introduce a mixture of Gaussian (GMM) prior, assuming that each modality represents each class-concept in a randomly sampled episode, which we optimize with Expectation-Maximization (EM). Then, the learned model can be used for downstream few-shot classification tasks, where we obtain task-specific parameters by performing semi-supervised EM on the latent representations of the support and query set, and predict labels of the query set by computing aggregated posteriors. We validate our model on Omniglot and Mini-ImageNet datasets by evaluating its performance on downstream few-shot classification tasks. The results show that our model obtain impressive performance gains over existing unsupervised meta-learning baselines, even outperforming supervised MAML on a certain setting.
Meta-GMVAE: Mixture of Gaussian VAE for Unsupervised Meta-Learning
1. Meta-GMVAE: Mixture of Gaussian VAEs for
Unsupervised Meta-Learning
Dong Bok Lee1, Dongchan Min1, Seanie Lee1, and Sung Ju Hwang1,2
KAIST1, AITRICS2, Seoul
2. Introduction
Unsupervised learning aims to learn meaningful representations from unlabeled data
that can be transferred to down-stream tasks.
Image
Unsupervised
Learning
Image recognition
…
Image segmentation
Representation
3. Introduction
Meta-learning shares the spirit of unsupervised learning in that they seek to learn more
effective learning procedure than learning from scratch.
Unsupervised Learning
Image recognition
…
Image segmentation
Representation
Meta-Learning
Task1: Tiger vs Cat
…
Task2: Car vs Bicycle
Model
…
…
4. Introduction
The fundamental difference of the two is that the most meta-learning approaches are
supervised, assuming full access to the labels.
Prototypical Network [1]
[1] Jake Snell, Kevin Swersky, and Richard S. Zemel: “Prototypical Networks for Few-shot learning”, NeuRIPS 2017
[2] Chelsea Finn, Pieter Abbeel, and Sergey Levine: “Model-Agnostic Meta-learning for Fast Adaptation of Deep Networks”, ICML 2018
MAML [2]
Supervision
5. Introduction
Due to the assumption of supervision, existing meta-learning methods have limitation:
they requires massive amounts of human efforts on labeling.
Labeling
Cat Dog
Car
6. Introduction
To overcome this, two recent works have proposed unsupervised meta-learning:
Unlabeled dataset
Unsupervised Meta-training Supervised Meta-test
Task 1
Support Set
Task N
Support Set
…
Query Set
Query Set
7. Introduction
They focus on constructing supervised meta-training dataset by pseudo labeling with
clustering and augmentation.
Unlabeled dataset
Unsupervised Meta-training Supervised Meta-test
Task 1
Support Set
Task N
Support Set
…
Query Set
Query Set
Pseudo Labeling
8. Method
In this work, we have focused on developing a principled unsupervised meta-learning
method, namely Meta-GMVAE.
The main idea is to bridge the gap between the process of unsupervised meta-training
and that of supervised meta-test.
9. Method (Unsupervised Meta-training)
Specifically, we start from Variational Autoencoder [5], where its prior is modelled with
Gaussian Mixtures.
The assumption is that each modality can represent a label at meta-test.
Its generative process is as follows:
1.
2.
3.
[5] Diederak P. Kingma, and Max Welling: “Auto-encoding Varitional Bayes”, ICLR 2013
A graphical illustration of GMVAE
10. Method (Unsupervised Meta-training)
However, the difference from the previous work on GMVAE [6] is that they fix the prior
parameters since they target a single task learning.
[6] Nat Dilokthanakul et al: “Deep unsupervised clustering with Gaussian Mixture Variational Autoencoder”, arXiv 2016
1.
2.
3.
A graphical illustration of GMVAE
This part is fixed
11. Method (Unsupervised Meta-training)
To learn the set-dependent multi-modalities, we assume that there exists a parameter
for each episode dataset which is randomly drawn.
A graphical illustration of Meta-GMVAE
A graphical illustration of GMVAE
15. Method (Unsupervised Meta-training)
Here we model set-dependent posterior to encode each data within the given dataset
into the latent space.
Specifically, this is implemented by TransfomerEncoder proposed by Vaswani et al. [7].
[7] Ashish Vaswani et al: “Attention is All You Need”, NeuRIPS 2017
18. Method (Unsupervised Meta-training)
In our setting, the prior parameter characterizes the given dataset.
To obtain the parameter that optimally explain the given dataset, we propose to locally
maximize the lowerbound as follows:
where
This leads to the MLE solution of the prior distribution.
19. Method (Unsupervised Meta-training)
However, we do not have an analytic MLE solution of GMM.
To this end, we propose to obtain optimal using EM algorithm as follows:
where we tie the covariance matrix with identity matrix.
20. Method (Unsupervised Meta-training)
Then the training objective of Meta-GMVAE is as follows:
where is obtained by performing the EM algorithm on the latent MC samples.
21. Method (Supervised Meta-test)
However, there is no guarantee that each modality obtained by EM algorithm
corresponds to the label.
To overcome this, we use support set as observations of EM algorithm.
A graphical illustration of
predicting labels
A graphical illustration of Meta-GMVAE
22. Method (Supervised Meta-test)
Then, we perform semi-supervised EM to obtain prior parameters as follows:
where the support set is considered as labeled data.
23. Method (Supervised Meta-test)
Finally, we predict labels of query set using the aggregated posterior as follows:
where we reuse the Monte Carlo samples.
24. Method (SimCLR)
The ability to generate samples may not be necessary for discriminative tasks.
Moreover, it is known to be challenging to train generative models on real data.
Therefore, we propose to use features pretrained by SimCLR [8] as an input.
[8] Ting Chen et al: “A Simple Framework for Constrastive Learning of Visual Representations”, ICML 2020
A graphical illustration of SimCLR Image recognition accuracy on ImageNet
25. Experimental Setups (Dataset)
We ran experiment using two benchmark datasets:
1) Omniglot dataset
28 x 28 resolution
grayscale
1200 meta-training classes
323 meta-test classes
26. Experimental Setups (Dataset)
We ran experiment using two benchmark datasets:
2) Mini-ImageNet dataset
84 x 84 resolution
RGB
64 meta-training classes
20 meta-test classes
27. Experimental Setups (Baselines)
We compare our Meta-GMVAE with four baselines as follows:
1) Prototypical Networks (oracle) [1]: metric-based meta-learning with supervision.
2) MAML (oracle) [2]: gradient-based meta-learning with supervision.
3) CACTUS [3]: constructing pseudo tasks by clustering on deep embedding space.
4) UMTRA [4]: constructing pseudo tasks using augmentation.
[1] Jake Snell, Kevin Swersky, and Richard S. Zemel: “Prototypical Networks for Few-shot learning”, NeuRIPS 2017
[2] Chelsea Finn, Pieter Abbeel, and Sergey Levine: “Model-Agnostic Meta-learning for Fast Adaptation of Deep Networks”, ICML 2018
[3] Kyle Hsu, Sergey Levine, and Chelsea Finn: “Unsupervised Learning vis Meta-Learning”, ICLR 2018
[4] Siavash Khodadadeh, Ladislau Boloni, and Mubarak Shaloni: “Unsupervised Meta-Learning for Few-shot Classification”, NeuRIPS 2019
28. Experimental Results (Few-shot Classification)
Meta-GMVAE obtains better performance than existing unsupervised methods in 5
settings and match the performance in 3 settings.
The few-shot classification results (way, shot) on the Omiglot and Mini-ImageNet Datasets
29. Experimental Results (Few-shot Classification)
Interestingly, Meta-GMVAE even obtains better performance than MAML on a certain
setting (i.e., (5,1) on Omniglot), while utilizing as small as 0.1% of labels.
The few-shot classification results (way, shot) on the Omiglot and Mini-ImageNet Datasets
30. Experimental Results (Visualization)
The below figures show how Meta-GMVAE learns and realizes class-concept.
At meta-train, Meta-GMVAE captures the similar visual structure but not class-concept.
However, it easily realizes the class-concept at meta-test.
The samples obtained and generated for each mode of Meta-GMVAE
31. Experimental Results (Ablation Study)
We conduct ablation study on Meta-GMVAE by eliminating each component.
The below shows that all the components are critical to the performance.
The results of ablation study on Meta-GMVAE
32. Experimental Results (Cross-way)
In more realistic settings, we can not know the way at meta-test time.
Meta-GMVAE also shows its robustness on the cross-way classification experiments.
The results of cross-way 1-shot experiments on Omniglot dataset
33. Experimental Results (Cross-way Visualization)
We further visualize the latent space for the cross-shot generalization experiment.
The below figure shows that Meta-GMVAE trained 20-way can cluster 5-way task.
The visualization of the latent space
34. Conclusion
1. We propose a principled meta-learning model, namely Meta-GMVAE, which meta-
learns the set-conditioned prior and posterior network for a VAE.
2. We propose to learn the multi-modal structure of a given dataset with the Gaussian
mixture prior, such that it can adapt to a novel dataset via the EM algorithm.
3. We show that Meta-GMVAE largely outperforms relevant unsupervised meta-
learning baselines on two benchmark datasets, while obtaining even better
performance than a supervised meta-learning model under a specific setting.