20191027 bread house seminar

Informa(on of deep learning model
X37
Oct 27th 2019, Tokyo
Bread house seminar

DEEP VARIATIONAL INFORMATION BOTTLENECK
• : Tishby et al. 1999 IB Deep robustness
• : VIB
• : Entropy regulariza(on (Pereyra 2017)
• : X Z Y X Z Z Y
Alexander A. Alemi et al. (Google Research), ICLR 2017
The second term encourage Z forget X. it forces Z to act
like a minimal suﬃcient statistic of X for predicting Y
Variational approximation and re-parameterization trick
Permutation-invariant MNIST Features mapping and error on diﬀerent beta
Target maximization function
Alemi, A., Fischer, I., Dillon, J., Murphy, K. (2016). Deep Variational Information Bottleneck arXiv.org cs.LG()

DEEP VARIATIONAL INFORMATION BOTTLENECK (cont.)
Alexander A. Alemi et al. (Google Research), ICLR 2017
Low beta allows the large I(Z, X), and large I(Z, X) causes the overﬁt on test set with decreasing I(Z, Y)
• Future direc*on: Open universe classiﬁca(on problem, sequence predic(on
• Connec*on to VAE
If we consider unsupervised versions of IB, it derives VAE loss.
The aim is to take our data X and maximize the mutual information contained in some encoding Z,
while restricting how much information we allow our representation to contain about the identity of each
data element in our sample (i)
• Rela*onship between I(Z, X) and I(Z, Y) and between beta and I(Z, X)
Alemi, A., Fischer, I., Dillon, J., Murphy, K. (2016). Deep Variational Information Bottleneck arXiv.org cs.LG()

Informa(on Dropout: Learning Op(mal Representa(ons Through Noisy Computa(on
• : IB dropout
• : Informa(on Dropout (VIB dropout ) + TC term
• : TCVAE
Alessandro Achille and Stefano Soa^o, IEEE 2018
• IB Lagrangian
• Approxima*on of noise injec*on (log-Normal dist.)
• Disentanglement by measuring the total correla*on
Minimizing TC term is intractable, but if we choose β=γ, it can be easily solved.
Stochas(c dropout
Achille, A., Soatto, S. (2017). Information Dropout: Learning Optimal Representations Through Noisy Computation IEEE Transactions on Pattern Analysis and Machine Intelligence 40(12), 2897-2905.

Informa(on Dropout: Learning Op(mal Representa(ons Through Noisy Computa(on
Alessandro Achille and Stefano Soa^o, IEEE 2018
• Total Correla*on vs Test error
• CIFAR with nuisance: Main (N…Noisy representa(on) vs Main (D … determinis(c)
• Informa*on dropout vs binary dropout
Achille, A., Soatto, S. (2017). Information Dropout: Learning Optimal Representations Through Noisy Computation IEEE Transactions on Pattern Analysis and Machine Intelligence 40(12), 2897-2905.

Emergence of Invariance and Disentanglement in Deep Representa(ons
• What is the desiderata for representa*ons?
Alessandro Achille and Stefano Soaô, Journal of Machine Learning Research 2018
• Informa*on BoPleneck Lagrangian is on the trade-off between sufficiency and minimality
https://www.youtube.com/watch?v=BCSoRTMYQcwAchille, A., Soatto, S. (2017). Emergence of Invariance and Disentanglement in Deep Representations arXiv.org cs.LG()

• Invariant = minimal: A representa(on is maximally insensi(ve to all nuisances if and only if it’s minimal
• Suﬃcient invariant representa*on
Achille, A., Soatto, S. (2017). Emergence of Invariance and Disentanglement in Deep Representations arXiv.org cs.LG()

• Informa*on decomposi*on of cross entropy
• To prevent overfiVng, we added a constraint of informa*on
Intrinsic error: prediction of the label even if we knew the underlying data distribution 
Sufficiency: how much information the dataset has about the parameter theta, which is measured from the weights 
Efficiency: efficiency of the model and class of functions with respect to which the loss is optimized 
Overfitting: uninformative information of the underlying data distribution, memorized in the weights
• Flat minima have low informa*on
Since the second term is intractable, we use the general upper-bound below.
Networks with low informa*on in the weights realize invariant and disentangled representa*ons
Therefore, invariance and disentanglement emerge naturally when training a network with
implicit (SGD) or explicit (IB Lagrangian) regulariza*on, and are related to flat minima.
Achille, A., Soatto, S. (2017). Emergence of Invariance and Disentanglement in Deep Representations arXiv.org cs.LG()

Where is the Informa(on in a Deep Neural Network?
Alessandro Achille & Stefano Soa^o, July 4th 2019 (NIPS 2019 under review)
• Informa*on in the weight
• Connec*on to Shannon Informa*on
• Connec*on to Fisher Informa*on
under the assumption of an isotropic Gaussian prior and Gaussian Posterior
I(w; D) is Shannon’s mutual informa(on between the weights and the dataset.
It can be seen as a func(on of the datasets, which means training algorithm (SGD)
The Shannon Informa*on of the weights controls generaliza*on; the Fisher controls invariance of the ac*va*ons
Achille, A., Soatto, S. (2019). Where is the Information in a Deep Neural Network?https://arxiv.org/abs/1905.12213

Where is the Informa(on in a Deep Neural Network?
Alessandro Achille & Stefano Soaô, July 4th 2019 (NIPS 2019 under review)
• the log-det. of the FIM during
training of a 3-layers fully connected
network on a simple 2D binary
classifica(on task
• FIM increase = complex classifica(on
• FIM bumps supports the idea that
feature learning may coincide with
crossing of narrow boPlenecks in the
loss landscape
Achille, A., Soatto, S. (2019). Where is the Information in a Deep Neural Network?https://arxiv.org/abs/1905.12213

CRITICAL LEARNING PERIODS IN DEEP NETWORKS
Alessandro Achille, Ma^eo Rovere, Stefano Soa^o, Feb 25th 2019 (ICLR 2019)
• DNNs exhibit cri*cal periods, Sensi*vity during learning
• Cri*cal periods in DNNs are traced back to changes in the Fisher Informa*on
Achille, A., Rovere, M., Soatto, S. (2017). Critical Learning Periods in Deep Neural Networkshttps://arxiv.org/abs/1711.08856

TASK2VEC: Task Embedding for Meta-Learning
Alessandro Achille, Feb 10th 2019
Achille, A., Lam, M., Tewari, R., Ravichandran, A., Maji, S., Fowlkes, C., Soatto, S., Perona, P. (2019). Task2Vec: Task Embedding for Meta-Learning arXiv.org cs.LG()

20191027 bread house seminar

Recommended

Recommended

More Related Content

Similar to 20191027 bread house seminar

Similar to 20191027 bread house seminar (13)

More from X 37

More from X 37 (9)

Recently uploaded

Recently uploaded (20)

20191027 bread house seminar