SlideShare a Scribd company logo
1 of 57
Download to read offline
1
Learning Discrete Representations via
Information Maximizing
Self-Augmented Training
Weihua Hu, Takeru Miyato, Seiya Tokui,
Eiichi Matsumoto, Masashi Sugiyama
Intelligent Information processing II
Nov 20, 2017
University of Tokyo, RIKEN AIP, Preferred Networks, Inc.
Proceedings of the 34th International Conference on Machine Learning
Presented by Shunsuke KITADA
The reason why I chose this paper
● With unsupervised learning achieved high accuracy (98%!)
in MNIST classification.
● Published from the University of Tokyo (Sugiyama lab)
and Preferred Networks.
● VAT is used as effective regularization term.
● Accepted by ICML 2017.
2
Contents
● Introduction
● Related work
● Method : IMSAT = IM + SAT
○ Information Maximization (IM)
○ Self-Augmented Training (SAT)
● Experiments
● Conclusion
3
Contents
● Introduction
● Related work
● Method : IMSAT = IM + SAT
○ Information Maximization (IM)
○ Self-Augmented Training (SAT)
● Experiments
● Conclusion
4
Introduction
● Unsupervised discrete representation Learning
5
○ To obtain a function that maps similar (or dissimilar) data into
similar (or dissimilar) discrete representations.
○ The similarity of data is defined according to applications of
interests.
Introduction
● Clustering and Hash learning
6
○ Clustering
■ Widely applied to data-driven application
domains. [Berkhin 2006]
○ Hash learning
■ Popular for an approximate nearest neighbor search for
large scale information retrieval. [Wang+ 2016]
Introduction
● Development of Deep neural networks
7
○ Scalability and flexibility
■ It is possible that learn complex feature and non-linear
decision boundaries.
○ Their model complexity is very huge
■ Regularization of the networks is crucial to learn
meaningful representations of data.
Introduction
● In unsupervised representation learning
8
○ Target representations are not provided.
○ There are no constraining conditions.
➔ We need to regularize the networks in order to learn useful
representations that exhibit intended invariance for
applications of interest.
◆ e.g. ) invariance to small perturbations or affine transformation
Introduction | In this paper
● Use data augmentation to model the invariance of
learned data representations
9
○ Map data points into their discrete representations by a deep
neural network.
○ Regularize it by encouraging its prediction to be invariant to data
augmentation.
10
● Self-Augmented Training
(SAT)
Encourage the predicted
representations of augmented data
points to be close to those of the original
data points in end-to-end fashion.
● Regularized Information
Maximization (RIM)
Maximize information theoretic
dependency between inputs and their
mapped outputs, while regularizing the
mapping function.
Information Maximizing
Self-Augmented Training
Contents
● Introduction
● Related work
● Method : IMSAT = IM + SAT
○ Information Maximization (IM)
○ Self-Augmented Training (SAT)
● Experiments
● Conclusion
11
Contents
● Introduction
● Related work
● Method : IMSAT = IM + SAT
○ Information Maximization (IM)
○ Self-Augmented Training (SAT)
● Experiments
● Conclusion
12
Related work | Clustering & Hash Learning
● The representative clustering and hashing methods
○ K-means clustering and hashing [He+ 2013]
○ Gaussian mixture model clustering, iterative quantization [Gong+ 2013]
○ Minimal-loss hashing [Norouzi & Blei 2011]
13
These methods can only model linear boundaries between
different representations.
Related work | Clustering & Hash Learning
● Methods that can model the non-linearity of data
○ Kernel-based [Xu+ 2014; Kulis & Darrell 2019]
○ Spectral clustering [Xu+ 2014; Kulis & Darrell 2019]
14
They are difficult to scale to large dataset.
Related work | Clustering & Hash Learning
● Deep learning based approach
○ Clustering
15
■ To learn feature representations and
cluster assignments [Xie+ 2016]
■ Model the data generation process by using deep
generative models with Gaussian mixture models as
prior dist [Dilokthanakul+ 2016; Zheng+ 2016]
Related work | Clustering & Hash Learning
● Deep learning based approach
○ Hash learning
16
■ Supervised hash learning
[Xia+ 2014; Lai+ 2015; Zhang+ 2015; Xu+2015; Li+ 2015]
■ Unsupervised hash learning
● Stacked RBM [Salakhutdinov & Hinton 2009]
● Use DL for the mapping function [Erin Liong+ 2015]
Related work | Clustering & Hash Learning
● Deep learning based approach
○ Hash learning
17
■ These unsupervised methods did not explicitly
intended impose the invariance on the learned
representations.
■ The predicted representations may not be useful
for applications of interest.
Related work | Data Augmentation
● About data augmentation
○ In supervised and semisupervised learning
18
■ Applying data augmentation to a supervised learning problem
is equivalent to adding a regularization to the original cost
function. [Leen 1995]
■ Achieve state-of-the-art performance in applying data
augmentation to semi-supervised learning.
[Bachman+ 2014; Miyato+ 2016; Sajjadi+ 2016]
Related work | Data Augmentation
● About data augmentation
○ In unsupervised learning
19
■ Proposed to use data augmentation to model the invariance
of learned representations. [Donovitskiy+ 2014]
Related work | Data Augmentation
● Difference between Dosoviskiy+ and IMSAT
20
○ Directly imposes the invariance on the learned representations
■ Dosoviskiy+ imposes invariance on surrogate classes, not
directly on the learned representations.
○ Focuses on learning discrete representations that are directly
usable for clustering and hsh learning
■ Doviskiy+ focused on learning continuous representations
that are then used for other tasks such as classification and
clustering.
Contents
● Introduction
● Related work
● Method : IMSAT = IM + SAT
○ Information Maximization (IM)
○ Self-Augmented Training (SAT)
● Experiments
● Conclusion 21
Contents
● Introduction
● Related work
● Method : IMSAT = IM + SAT
○ Information Maximization (IM)
○ Self-Augmented Training (SAT)
● Experiments
● Conclusion 22
At the same time, it regularizes the complexity of the classifier. Let and   
denote random variables for data and cluster assignments, respectively, where K is
the number of clusters.
Method | about RIM
The RIM [Gomes+ 2010] learns a following probabilistic classifier such that
mutual information [Cover and Thomas 2012] between inputs and cluster
assignments is maximized.
23
Contents
● Introduction
● Related work
● Method : IMSAT = IM + SAT
○ Information Maximization (IM)
○ Self-Augmented Training (SAT)
● Experiments
● Conclusion 24
Contents
● Introduction
● Related work
● Method : IMSAT = IM + SAT
○ Information Maximization (IM)
○ Self-Augmented Training (SAT)
● Experiments
● Conclusion 25
where . Let be
a random variable for the discrete representation.
Method | about IMSAT
● Information maximization for learning discrete representations
26
Extend the RIM and consider learning M-dimensional discrete representations of
data. Let the output domain be
Method | about IMSAT
● Information maximization for learning discrete representations
27
To learn a multi-output probabilistic classifier that maps similar
inputs into similar representations. And then model the conditional probability by
using deep neural network.
Under the model, inputs are conditionally independent given x:
Contents
● Introduction
● Related work
● Method : IMSAT = IM + SAT
○ Information Maximization (IM)
○ Self-Augmented Training (SAT)
● Experiments
● Conclusion 28
Contents
● Introduction
● Related work
● Method : IMSAT = IM + SAT
○ Information Maximization (IM)
○ Self-Augmented Training (SAT)
● Experiments
● Conclusion 29
Method | about IMSAT
● Regularization of deep neural networks via SAT
30
SAT uses data augmentation to impose the intended invariance on the data
representation. Let denote a pre-defined data augmentation under
which the data representations shuold be invariant. The regularization of SAT made
on data point x is
Method | about IMSAT
● Regularization of deep neural networks via SAT
31
SAT uses data augmentation to impose the intended invariance on the data
representation. Let denote a pre-defined data augmentation under
which the data representations shuold be invariant. The regularization of SAT made
on data point x is
The prediction of original
data point x
Method | about IMSAT
● Regularization of deep neural networks via SAT
32
SAT uses data augmentation to impose the intended invariance on the data
representation. Let denote a pre-defined data augmentation under
which the data representations shuold be invariant. The regularization of SAT made
on data point x is
The prediction of
augmented data point x
Method | about IMSAT
● Regularization of deep neural networks via SAT
33
The regularization by SAT is then the average of over all the
training data points:
The augmented function T means adding small perturbation r and can be expressed
by the following expression:
Method | about IMSAT
● Regularization of deep neural networks via SAT
34
The two representative regularization methods based on local perturbations
● Random Perturbation Training (RPT) [Bachman+ 2016]
● Virtual Adversarial Training (VAT) [Miyato+ 2016]
In VAT, perturbation r is chosen to be an adversarial direction:
Method | for Clustering
35
In clustering, we can directly apply the RIM.
By representing mutual information as the difference between marginal entropy and
conditional entropy [Cover & Thomas 2012], we have the objective to minimize:
The two entropy terms can be calculated as
Method | for Clustering
36
Here, h is the following entropy function:
● Increasing the marginal entropy H(Y)
○ Encourages the cluster sizes to be uniform
● Decreasing the conditional entropy H(Y|X)
○ Encourages unambiguous cluster assignments [Bridle+ 1991]
In the previous research shows that we can incorporate our prior knowledge on
cluster sizes by modifying H(Y) [Gomes+ 2010]
Method | for Clustering
37
H(Y) can be rewritten as follows:
Maximization of H(Y) is equivalent to minimization of KL, which encourages
predicted cluster dist pθ(y) to be close U.
Replaced U in KL with any specified class prior q(y) so that pθ(y) is encouraged to
be close to q(y). We consider the following constrained optimization problem:
Method | for Hash Learning
38
Considering the output space of the augmented data, this gives us
Follows from the definition of interaction information and the conditional
independence that
Method | for Hash Learning
39
In hash learning, each data point is mapped into a D-bit binary code. So the
original RIM is not directly applicable.
The computation of mutual information of D-bit binary code is intractable for large
D because it involves a summation over an exponential number of terms.
[Brown 2009] shows that mutual information can be expanded as the sum of
interaction information like:
Method | for Hash Learning
40
In summary, our approximated objective to minimize is
● First term
○ Regularizes the neural network
● Second term
○ Maximizes the mutual information between data and each hash bit
● Third term
○ Removes the redundancy among the hash bits
Method | Marginal Distribution
41
It is necessary to calculate the marginal distribution when computing mutual
information. This is computationally done using the entire dataset, which is not
suitable for using mini batch SGD. Therefore, we use an following approximation:
In the case of clustering, the approximated objective that we actually minimize is an
upper bound of the exact objective that we try to minimize.
Contents
● Introduction
● Related work
● Method : IMSAT = IM + SAT
○ Information Maximization (IM)
○ Self-Augmented Training (SAT)
● Experiments
● Conclusion 42
Contents
● Introduction
● Related work
● Method : IMSAT = IM + SAT
○ Information Maximization (IM)
○ Self-Augmented Training (SAT)
● Experiments
● Conclusion 43
Experiments | Overview
44
● About implements
● About clustering
● About hash learning
Experiments | about implements
45
● Clustering
○ Set the network dimensionality to d-1200-1200-M
○ Use Softmax as output layer
● Hash learning
○ Use smaller network sizes to ensure fast computation of mapping
data info hash codes (will be shown later).
○ Use sigmoid as output layer
● Use Adam, ReLU, BatchNorm
Experiments | clustering
46
● About baseline models
Experiments | clustering
47
● About datasets
Experiments | clustering
48
● About evaluation metric
○ Evaluate with Unsupervised clustering accuracy (ACC)
Experiments | clustering
49
● Experiment result
Experiments | clustering
50
● Experiment result
Experiments | clustering
51
● Experiment result
Experiments | hash learning
52
● About dataset
○ MNIST / CIFAR-10
● About baseline models
○ Spectral hashing [Weiss+ 2009]
○ PCA-ITQ [Gong+ 2013]
○ Deep Hash [Erin Liong+ 2015]
○ Linear RIM / Deep RIM / IMSAT(VAT)
Contents
● Introduction
● Related work
● Method : IMSAT = IM + SAT
○ Information Maximization (IM)
○ Self-Augmented Training (SAT)
● Experiments
● Conclusion 53
Experiments | hash learning
54
Experiments | hash learning
55
● About evaluation metric
○ Mean Average Precision (mAP)
○ Precision at N = 500 samples
○ Hamming distance
Contents
● Introduction
● Related work
● Method : IMSAT = IM + SAT
○ Information Maximization (IM)
○ Self-Augmented Training (SAT)
● Experiments
● Conclusion 56
Conclusion | IMSAT
57
● Proposed “IMSAT”
○ Information theoretic method for unsupervised discrete
representation learning using deep neural networks
● Directly introduce invariance to data augmentation in
an end-to-end fashion
○ Learn robust discrete representations for small perturbations and
affine transformations

More Related Content

What's hot

IIBMP2016 深層生成モデルによる表現学習
IIBMP2016 深層生成モデルによる表現学習IIBMP2016 深層生成モデルによる表現学習
IIBMP2016 深層生成モデルによる表現学習Preferred Networks
 
【DL輪読会】AUTOGT: AUTOMATED GRAPH TRANSFORMER ARCHITECTURE SEARCH
【DL輪読会】AUTOGT: AUTOMATED GRAPH TRANSFORMER ARCHITECTURE SEARCH【DL輪読会】AUTOGT: AUTOMATED GRAPH TRANSFORMER ARCHITECTURE SEARCH
【DL輪読会】AUTOGT: AUTOMATED GRAPH TRANSFORMER ARCHITECTURE SEARCHDeep Learning JP
 
[CVPR読み会]BING:Binarized normed gradients for objectness estimation at 300fps
[CVPR読み会]BING:Binarized normed gradients for objectness estimation at 300fps[CVPR読み会]BING:Binarized normed gradients for objectness estimation at 300fps
[CVPR読み会]BING:Binarized normed gradients for objectness estimation at 300fpsTakuya Minagawa
 
深層学習の判断根拠を理解するための 研究とその意義 @PRMU 2017熊本
深層学習の判断根拠を理解するための 研究とその意義 @PRMU 2017熊本深層学習の判断根拠を理解するための 研究とその意義 @PRMU 2017熊本
深層学習の判断根拠を理解するための 研究とその意義 @PRMU 2017熊本Takahiro Kubo
 
第10章後半「ブースティングと加法的木」
第10章後半「ブースティングと加法的木」第10章後半「ブースティングと加法的木」
第10章後半「ブースティングと加法的木」T T
 
MIRU2016 チュートリアル
MIRU2016 チュートリアルMIRU2016 チュートリアル
MIRU2016 チュートリアルShunsuke Ono
 
2022年度秋学期 画像情報処理 第11回 逆投影法による再構成 (2022. 12. 9)
2022年度秋学期 画像情報処理 第11回 逆投影法による再構成 (2022. 12. 9) 2022年度秋学期 画像情報処理 第11回 逆投影法による再構成 (2022. 12. 9)
2022年度秋学期 画像情報処理 第11回 逆投影法による再構成 (2022. 12. 9) Akira Asano
 
【DL輪読会】Drag Your GAN: Interactive Point-based Manipulation on the Generative ...
【DL輪読会】Drag Your GAN: Interactive Point-based Manipulation on the Generative ...【DL輪読会】Drag Your GAN: Interactive Point-based Manipulation on the Generative ...
【DL輪読会】Drag Your GAN: Interactive Point-based Manipulation on the Generative ...Deep Learning JP
 
統計的因果推論 勉強用 isseing333
統計的因果推論 勉強用 isseing333統計的因果推論 勉強用 isseing333
統計的因果推論 勉強用 isseing333Issei Kurahashi
 
【DL輪読会】"Instant Neural Graphics Primitives with a Multiresolution Hash Encoding"
【DL輪読会】"Instant Neural Graphics Primitives with a Multiresolution Hash Encoding"【DL輪読会】"Instant Neural Graphics Primitives with a Multiresolution Hash Encoding"
【DL輪読会】"Instant Neural Graphics Primitives with a Multiresolution Hash Encoding"Deep Learning JP
 
ディープラーニングによる時系列データの異常検知
ディープラーニングによる時系列データの異常検知ディープラーニングによる時系列データの異常検知
ディープラーニングによる時系列データの異常検知Core Concept Technologies
 
双曲平面のモデルと初等幾何
双曲平面のモデルと初等幾何双曲平面のモデルと初等幾何
双曲平面のモデルと初等幾何matsumoring
 
「統計的学習理論」第1章
「統計的学習理論」第1章「統計的学習理論」第1章
「統計的学習理論」第1章Kota Matsui
 
深層生成モデルの理論と導出(Variational Auto-encoderからADGM)
深層生成モデルの理論と導出(Variational Auto-encoderからADGM)深層生成モデルの理論と導出(Variational Auto-encoderからADGM)
深層生成モデルの理論と導出(Variational Auto-encoderからADGM)Tetsuo Tashiro
 
SSII2022 [OS3-04] Human-in-the-Loop 機械学習
SSII2022 [OS3-04] Human-in-the-Loop 機械学習SSII2022 [OS3-04] Human-in-the-Loop 機械学習
SSII2022 [OS3-04] Human-in-the-Loop 機械学習SSII
 
機械学習プロフェッショナルシリーズ輪読会 #5 異常検知と変化検知 Chapter 1 & 2 資料
機械学習プロフェッショナルシリーズ輪読会 #5 異常検知と変化検知 Chapter 1 & 2 資料機械学習プロフェッショナルシリーズ輪読会 #5 異常検知と変化検知 Chapter 1 & 2 資料
機械学習プロフェッショナルシリーズ輪読会 #5 異常検知と変化検知 Chapter 1 & 2 資料at grandpa
 
【DL輪読会】Generative models for molecular discovery: Recent advances and challenges
【DL輪読会】Generative models for molecular discovery: Recent advances and challenges【DL輪読会】Generative models for molecular discovery: Recent advances and challenges
【DL輪読会】Generative models for molecular discovery: Recent advances and challengesDeep Learning JP
 
統計的因果推論への招待 -因果構造探索を中心に-
統計的因果推論への招待 -因果構造探索を中心に-統計的因果推論への招待 -因果構造探索を中心に-
統計的因果推論への招待 -因果構造探索を中心に-Shiga University, RIKEN
 
Prml Reading Group 10 8.3
Prml Reading Group 10 8.3Prml Reading Group 10 8.3
Prml Reading Group 10 8.3正志 坪坂
 
高速な物体候補領域提案手法 (Fast Object Proposal Methods)
高速な物体候補領域提案手法 (Fast Object Proposal Methods)高速な物体候補領域提案手法 (Fast Object Proposal Methods)
高速な物体候補領域提案手法 (Fast Object Proposal Methods)Takao Yamanaka
 

What's hot (20)

IIBMP2016 深層生成モデルによる表現学習
IIBMP2016 深層生成モデルによる表現学習IIBMP2016 深層生成モデルによる表現学習
IIBMP2016 深層生成モデルによる表現学習
 
【DL輪読会】AUTOGT: AUTOMATED GRAPH TRANSFORMER ARCHITECTURE SEARCH
【DL輪読会】AUTOGT: AUTOMATED GRAPH TRANSFORMER ARCHITECTURE SEARCH【DL輪読会】AUTOGT: AUTOMATED GRAPH TRANSFORMER ARCHITECTURE SEARCH
【DL輪読会】AUTOGT: AUTOMATED GRAPH TRANSFORMER ARCHITECTURE SEARCH
 
[CVPR読み会]BING:Binarized normed gradients for objectness estimation at 300fps
[CVPR読み会]BING:Binarized normed gradients for objectness estimation at 300fps[CVPR読み会]BING:Binarized normed gradients for objectness estimation at 300fps
[CVPR読み会]BING:Binarized normed gradients for objectness estimation at 300fps
 
深層学習の判断根拠を理解するための 研究とその意義 @PRMU 2017熊本
深層学習の判断根拠を理解するための 研究とその意義 @PRMU 2017熊本深層学習の判断根拠を理解するための 研究とその意義 @PRMU 2017熊本
深層学習の判断根拠を理解するための 研究とその意義 @PRMU 2017熊本
 
第10章後半「ブースティングと加法的木」
第10章後半「ブースティングと加法的木」第10章後半「ブースティングと加法的木」
第10章後半「ブースティングと加法的木」
 
MIRU2016 チュートリアル
MIRU2016 チュートリアルMIRU2016 チュートリアル
MIRU2016 チュートリアル
 
2022年度秋学期 画像情報処理 第11回 逆投影法による再構成 (2022. 12. 9)
2022年度秋学期 画像情報処理 第11回 逆投影法による再構成 (2022. 12. 9) 2022年度秋学期 画像情報処理 第11回 逆投影法による再構成 (2022. 12. 9)
2022年度秋学期 画像情報処理 第11回 逆投影法による再構成 (2022. 12. 9)
 
【DL輪読会】Drag Your GAN: Interactive Point-based Manipulation on the Generative ...
【DL輪読会】Drag Your GAN: Interactive Point-based Manipulation on the Generative ...【DL輪読会】Drag Your GAN: Interactive Point-based Manipulation on the Generative ...
【DL輪読会】Drag Your GAN: Interactive Point-based Manipulation on the Generative ...
 
統計的因果推論 勉強用 isseing333
統計的因果推論 勉強用 isseing333統計的因果推論 勉強用 isseing333
統計的因果推論 勉強用 isseing333
 
【DL輪読会】"Instant Neural Graphics Primitives with a Multiresolution Hash Encoding"
【DL輪読会】"Instant Neural Graphics Primitives with a Multiresolution Hash Encoding"【DL輪読会】"Instant Neural Graphics Primitives with a Multiresolution Hash Encoding"
【DL輪読会】"Instant Neural Graphics Primitives with a Multiresolution Hash Encoding"
 
ディープラーニングによる時系列データの異常検知
ディープラーニングによる時系列データの異常検知ディープラーニングによる時系列データの異常検知
ディープラーニングによる時系列データの異常検知
 
双曲平面のモデルと初等幾何
双曲平面のモデルと初等幾何双曲平面のモデルと初等幾何
双曲平面のモデルと初等幾何
 
「統計的学習理論」第1章
「統計的学習理論」第1章「統計的学習理論」第1章
「統計的学習理論」第1章
 
深層生成モデルの理論と導出(Variational Auto-encoderからADGM)
深層生成モデルの理論と導出(Variational Auto-encoderからADGM)深層生成モデルの理論と導出(Variational Auto-encoderからADGM)
深層生成モデルの理論と導出(Variational Auto-encoderからADGM)
 
SSII2022 [OS3-04] Human-in-the-Loop 機械学習
SSII2022 [OS3-04] Human-in-the-Loop 機械学習SSII2022 [OS3-04] Human-in-the-Loop 機械学習
SSII2022 [OS3-04] Human-in-the-Loop 機械学習
 
機械学習プロフェッショナルシリーズ輪読会 #5 異常検知と変化検知 Chapter 1 & 2 資料
機械学習プロフェッショナルシリーズ輪読会 #5 異常検知と変化検知 Chapter 1 & 2 資料機械学習プロフェッショナルシリーズ輪読会 #5 異常検知と変化検知 Chapter 1 & 2 資料
機械学習プロフェッショナルシリーズ輪読会 #5 異常検知と変化検知 Chapter 1 & 2 資料
 
【DL輪読会】Generative models for molecular discovery: Recent advances and challenges
【DL輪読会】Generative models for molecular discovery: Recent advances and challenges【DL輪読会】Generative models for molecular discovery: Recent advances and challenges
【DL輪読会】Generative models for molecular discovery: Recent advances and challenges
 
統計的因果推論への招待 -因果構造探索を中心に-
統計的因果推論への招待 -因果構造探索を中心に-統計的因果推論への招待 -因果構造探索を中心に-
統計的因果推論への招待 -因果構造探索を中心に-
 
Prml Reading Group 10 8.3
Prml Reading Group 10 8.3Prml Reading Group 10 8.3
Prml Reading Group 10 8.3
 
高速な物体候補領域提案手法 (Fast Object Proposal Methods)
高速な物体候補領域提案手法 (Fast Object Proposal Methods)高速な物体候補領域提案手法 (Fast Object Proposal Methods)
高速な物体候補領域提案手法 (Fast Object Proposal Methods)
 

Similar to Information Maximizing Self-Augmented Training for Unsupervised Discrete Representation Learning

[Icml2019] parameter efficient training of deep convolutional neural network...
[Icml2019] parameter efficient training of  deep convolutional neural network...[Icml2019] parameter efficient training of  deep convolutional neural network...
[Icml2019] parameter efficient training of deep convolutional neural network...LeapMind Inc
 
Methodology (DLAI D6L2 2017 UPC Deep Learning for Artificial Intelligence)
Methodology (DLAI D6L2 2017 UPC Deep Learning for Artificial Intelligence)Methodology (DLAI D6L2 2017 UPC Deep Learning for Artificial Intelligence)
Methodology (DLAI D6L2 2017 UPC Deep Learning for Artificial Intelligence)Universitat Politècnica de Catalunya
 
Scalable Graph Convolutional Network Based Link Prediction on a Distributed G...
Scalable Graph Convolutional Network Based Link Prediction on a Distributed G...Scalable Graph Convolutional Network Based Link Prediction on a Distributed G...
Scalable Graph Convolutional Network Based Link Prediction on a Distributed G...miyurud
 
Big Data & Social Analytics presentation
Big Data & Social Analytics presentationBig Data & Social Analytics presentation
Big Data & Social Analytics presentationgustavosouto
 
Online advertising and large scale model fitting
Online advertising and large scale model fittingOnline advertising and large scale model fitting
Online advertising and large scale model fittingWush Wu
 
Dimensionality Reduction
Dimensionality ReductionDimensionality Reduction
Dimensionality ReductionSaad Elbeleidy
 
Overview of machine learning
Overview of machine learning Overview of machine learning
Overview of machine learning SolivarLabs
 
Application of interpolation in CSE
Application of interpolation in CSEApplication of interpolation in CSE
Application of interpolation in CSEMd. Tanvir Hossain
 
Semantic Segmentation on Satellite Imagery
Semantic Segmentation on Satellite ImagerySemantic Segmentation on Satellite Imagery
Semantic Segmentation on Satellite ImageryRAHUL BHOJWANI
 
Joint contrastive learning with infinite possibilities
Joint contrastive learning with infinite possibilitiesJoint contrastive learning with infinite possibilities
Joint contrastive learning with infinite possibilitiestaeseon ryu
 
Why Deep Learning Works: Dec 13, 2018 at ICSI, UC Berkeley
Why Deep Learning Works: Dec 13, 2018 at ICSI, UC BerkeleyWhy Deep Learning Works: Dec 13, 2018 at ICSI, UC Berkeley
Why Deep Learning Works: Dec 13, 2018 at ICSI, UC BerkeleyCharles Martin
 
More investment in Research and Development for better Education in the future?
More investment in Research and Development for better Education in the future?More investment in Research and Development for better Education in the future?
More investment in Research and Development for better Education in the future?Dhafer Malouche
 
Uncertainty Awareness in Integrating Machine Learning and Game Theory
Uncertainty Awareness in Integrating Machine Learning and Game TheoryUncertainty Awareness in Integrating Machine Learning and Game Theory
Uncertainty Awareness in Integrating Machine Learning and Game TheoryRikiya Takahashi
 
Adapting neural networks for the estimation of treatment effects
Adapting neural networks for the estimation of treatment effectsAdapting neural networks for the estimation of treatment effects
Adapting neural networks for the estimation of treatment effectsViswanath Gangavaram
 
Statistical Databases
Statistical DatabasesStatistical Databases
Statistical Databasesssuseraef7e0
 
Reinforcement learning
Reinforcement learningReinforcement learning
Reinforcement learningDongHyun Kwak
 

Similar to Information Maximizing Self-Augmented Training for Unsupervised Discrete Representation Learning (20)

[Icml2019] parameter efficient training of deep convolutional neural network...
[Icml2019] parameter efficient training of  deep convolutional neural network...[Icml2019] parameter efficient training of  deep convolutional neural network...
[Icml2019] parameter efficient training of deep convolutional neural network...
 
A detailed analysis of the supervised machine Learning Algorithms
A detailed analysis of the supervised machine Learning AlgorithmsA detailed analysis of the supervised machine Learning Algorithms
A detailed analysis of the supervised machine Learning Algorithms
 
Lecture 1.pptx
Lecture 1.pptxLecture 1.pptx
Lecture 1.pptx
 
Methodology (DLAI D6L2 2017 UPC Deep Learning for Artificial Intelligence)
Methodology (DLAI D6L2 2017 UPC Deep Learning for Artificial Intelligence)Methodology (DLAI D6L2 2017 UPC Deep Learning for Artificial Intelligence)
Methodology (DLAI D6L2 2017 UPC Deep Learning for Artificial Intelligence)
 
Scalable Graph Convolutional Network Based Link Prediction on a Distributed G...
Scalable Graph Convolutional Network Based Link Prediction on a Distributed G...Scalable Graph Convolutional Network Based Link Prediction on a Distributed G...
Scalable Graph Convolutional Network Based Link Prediction on a Distributed G...
 
Big Data & Social Analytics presentation
Big Data & Social Analytics presentationBig Data & Social Analytics presentation
Big Data & Social Analytics presentation
 
Online advertising and large scale model fitting
Online advertising and large scale model fittingOnline advertising and large scale model fitting
Online advertising and large scale model fitting
 
Dimensionality Reduction
Dimensionality ReductionDimensionality Reduction
Dimensionality Reduction
 
Overview of machine learning
Overview of machine learning Overview of machine learning
Overview of machine learning
 
Application of interpolation in CSE
Application of interpolation in CSEApplication of interpolation in CSE
Application of interpolation in CSE
 
Semantic Segmentation on Satellite Imagery
Semantic Segmentation on Satellite ImagerySemantic Segmentation on Satellite Imagery
Semantic Segmentation on Satellite Imagery
 
Joint contrastive learning with infinite possibilities
Joint contrastive learning with infinite possibilitiesJoint contrastive learning with infinite possibilities
Joint contrastive learning with infinite possibilities
 
Why Deep Learning Works: Dec 13, 2018 at ICSI, UC Berkeley
Why Deep Learning Works: Dec 13, 2018 at ICSI, UC BerkeleyWhy Deep Learning Works: Dec 13, 2018 at ICSI, UC Berkeley
Why Deep Learning Works: Dec 13, 2018 at ICSI, UC Berkeley
 
Big Data Challenges and Solutions
Big Data Challenges and SolutionsBig Data Challenges and Solutions
Big Data Challenges and Solutions
 
More investment in Research and Development for better Education in the future?
More investment in Research and Development for better Education in the future?More investment in Research and Development for better Education in the future?
More investment in Research and Development for better Education in the future?
 
Uncertainty Awareness in Integrating Machine Learning and Game Theory
Uncertainty Awareness in Integrating Machine Learning and Game TheoryUncertainty Awareness in Integrating Machine Learning and Game Theory
Uncertainty Awareness in Integrating Machine Learning and Game Theory
 
Adapting neural networks for the estimation of treatment effects
Adapting neural networks for the estimation of treatment effectsAdapting neural networks for the estimation of treatment effects
Adapting neural networks for the estimation of treatment effects
 
Week 9: Programming for Data Analysis
Week 9: Programming for Data AnalysisWeek 9: Programming for Data Analysis
Week 9: Programming for Data Analysis
 
Statistical Databases
Statistical DatabasesStatistical Databases
Statistical Databases
 
Reinforcement learning
Reinforcement learningReinforcement learning
Reinforcement learning
 

Recently uploaded

FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改atducpo
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiSuhani Kapoor
 
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...shivangimorya083
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystSamantha Rae Coolbeth
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 
Digi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptxDigi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptxTanveerAhmed817946
 

Recently uploaded (20)

FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data Analyst
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
Digi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptxDigi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptx
 

Information Maximizing Self-Augmented Training for Unsupervised Discrete Representation Learning

  • 1. 1 Learning Discrete Representations via Information Maximizing Self-Augmented Training Weihua Hu, Takeru Miyato, Seiya Tokui, Eiichi Matsumoto, Masashi Sugiyama Intelligent Information processing II Nov 20, 2017 University of Tokyo, RIKEN AIP, Preferred Networks, Inc. Proceedings of the 34th International Conference on Machine Learning Presented by Shunsuke KITADA
  • 2. The reason why I chose this paper ● With unsupervised learning achieved high accuracy (98%!) in MNIST classification. ● Published from the University of Tokyo (Sugiyama lab) and Preferred Networks. ● VAT is used as effective regularization term. ● Accepted by ICML 2017. 2
  • 3. Contents ● Introduction ● Related work ● Method : IMSAT = IM + SAT ○ Information Maximization (IM) ○ Self-Augmented Training (SAT) ● Experiments ● Conclusion 3
  • 4. Contents ● Introduction ● Related work ● Method : IMSAT = IM + SAT ○ Information Maximization (IM) ○ Self-Augmented Training (SAT) ● Experiments ● Conclusion 4
  • 5. Introduction ● Unsupervised discrete representation Learning 5 ○ To obtain a function that maps similar (or dissimilar) data into similar (or dissimilar) discrete representations. ○ The similarity of data is defined according to applications of interests.
  • 6. Introduction ● Clustering and Hash learning 6 ○ Clustering ■ Widely applied to data-driven application domains. [Berkhin 2006] ○ Hash learning ■ Popular for an approximate nearest neighbor search for large scale information retrieval. [Wang+ 2016]
  • 7. Introduction ● Development of Deep neural networks 7 ○ Scalability and flexibility ■ It is possible that learn complex feature and non-linear decision boundaries. ○ Their model complexity is very huge ■ Regularization of the networks is crucial to learn meaningful representations of data.
  • 8. Introduction ● In unsupervised representation learning 8 ○ Target representations are not provided. ○ There are no constraining conditions. ➔ We need to regularize the networks in order to learn useful representations that exhibit intended invariance for applications of interest. ◆ e.g. ) invariance to small perturbations or affine transformation
  • 9. Introduction | In this paper ● Use data augmentation to model the invariance of learned data representations 9 ○ Map data points into their discrete representations by a deep neural network. ○ Regularize it by encouraging its prediction to be invariant to data augmentation.
  • 10. 10 ● Self-Augmented Training (SAT) Encourage the predicted representations of augmented data points to be close to those of the original data points in end-to-end fashion. ● Regularized Information Maximization (RIM) Maximize information theoretic dependency between inputs and their mapped outputs, while regularizing the mapping function. Information Maximizing Self-Augmented Training
  • 11. Contents ● Introduction ● Related work ● Method : IMSAT = IM + SAT ○ Information Maximization (IM) ○ Self-Augmented Training (SAT) ● Experiments ● Conclusion 11
  • 12. Contents ● Introduction ● Related work ● Method : IMSAT = IM + SAT ○ Information Maximization (IM) ○ Self-Augmented Training (SAT) ● Experiments ● Conclusion 12
  • 13. Related work | Clustering & Hash Learning ● The representative clustering and hashing methods ○ K-means clustering and hashing [He+ 2013] ○ Gaussian mixture model clustering, iterative quantization [Gong+ 2013] ○ Minimal-loss hashing [Norouzi & Blei 2011] 13 These methods can only model linear boundaries between different representations.
  • 14. Related work | Clustering & Hash Learning ● Methods that can model the non-linearity of data ○ Kernel-based [Xu+ 2014; Kulis & Darrell 2019] ○ Spectral clustering [Xu+ 2014; Kulis & Darrell 2019] 14 They are difficult to scale to large dataset.
  • 15. Related work | Clustering & Hash Learning ● Deep learning based approach ○ Clustering 15 ■ To learn feature representations and cluster assignments [Xie+ 2016] ■ Model the data generation process by using deep generative models with Gaussian mixture models as prior dist [Dilokthanakul+ 2016; Zheng+ 2016]
  • 16. Related work | Clustering & Hash Learning ● Deep learning based approach ○ Hash learning 16 ■ Supervised hash learning [Xia+ 2014; Lai+ 2015; Zhang+ 2015; Xu+2015; Li+ 2015] ■ Unsupervised hash learning ● Stacked RBM [Salakhutdinov & Hinton 2009] ● Use DL for the mapping function [Erin Liong+ 2015]
  • 17. Related work | Clustering & Hash Learning ● Deep learning based approach ○ Hash learning 17 ■ These unsupervised methods did not explicitly intended impose the invariance on the learned representations. ■ The predicted representations may not be useful for applications of interest.
  • 18. Related work | Data Augmentation ● About data augmentation ○ In supervised and semisupervised learning 18 ■ Applying data augmentation to a supervised learning problem is equivalent to adding a regularization to the original cost function. [Leen 1995] ■ Achieve state-of-the-art performance in applying data augmentation to semi-supervised learning. [Bachman+ 2014; Miyato+ 2016; Sajjadi+ 2016]
  • 19. Related work | Data Augmentation ● About data augmentation ○ In unsupervised learning 19 ■ Proposed to use data augmentation to model the invariance of learned representations. [Donovitskiy+ 2014]
  • 20. Related work | Data Augmentation ● Difference between Dosoviskiy+ and IMSAT 20 ○ Directly imposes the invariance on the learned representations ■ Dosoviskiy+ imposes invariance on surrogate classes, not directly on the learned representations. ○ Focuses on learning discrete representations that are directly usable for clustering and hsh learning ■ Doviskiy+ focused on learning continuous representations that are then used for other tasks such as classification and clustering.
  • 21. Contents ● Introduction ● Related work ● Method : IMSAT = IM + SAT ○ Information Maximization (IM) ○ Self-Augmented Training (SAT) ● Experiments ● Conclusion 21
  • 22. Contents ● Introduction ● Related work ● Method : IMSAT = IM + SAT ○ Information Maximization (IM) ○ Self-Augmented Training (SAT) ● Experiments ● Conclusion 22
  • 23. At the same time, it regularizes the complexity of the classifier. Let and    denote random variables for data and cluster assignments, respectively, where K is the number of clusters. Method | about RIM The RIM [Gomes+ 2010] learns a following probabilistic classifier such that mutual information [Cover and Thomas 2012] between inputs and cluster assignments is maximized. 23
  • 24. Contents ● Introduction ● Related work ● Method : IMSAT = IM + SAT ○ Information Maximization (IM) ○ Self-Augmented Training (SAT) ● Experiments ● Conclusion 24
  • 25. Contents ● Introduction ● Related work ● Method : IMSAT = IM + SAT ○ Information Maximization (IM) ○ Self-Augmented Training (SAT) ● Experiments ● Conclusion 25
  • 26. where . Let be a random variable for the discrete representation. Method | about IMSAT ● Information maximization for learning discrete representations 26 Extend the RIM and consider learning M-dimensional discrete representations of data. Let the output domain be
  • 27. Method | about IMSAT ● Information maximization for learning discrete representations 27 To learn a multi-output probabilistic classifier that maps similar inputs into similar representations. And then model the conditional probability by using deep neural network. Under the model, inputs are conditionally independent given x:
  • 28. Contents ● Introduction ● Related work ● Method : IMSAT = IM + SAT ○ Information Maximization (IM) ○ Self-Augmented Training (SAT) ● Experiments ● Conclusion 28
  • 29. Contents ● Introduction ● Related work ● Method : IMSAT = IM + SAT ○ Information Maximization (IM) ○ Self-Augmented Training (SAT) ● Experiments ● Conclusion 29
  • 30. Method | about IMSAT ● Regularization of deep neural networks via SAT 30 SAT uses data augmentation to impose the intended invariance on the data representation. Let denote a pre-defined data augmentation under which the data representations shuold be invariant. The regularization of SAT made on data point x is
  • 31. Method | about IMSAT ● Regularization of deep neural networks via SAT 31 SAT uses data augmentation to impose the intended invariance on the data representation. Let denote a pre-defined data augmentation under which the data representations shuold be invariant. The regularization of SAT made on data point x is The prediction of original data point x
  • 32. Method | about IMSAT ● Regularization of deep neural networks via SAT 32 SAT uses data augmentation to impose the intended invariance on the data representation. Let denote a pre-defined data augmentation under which the data representations shuold be invariant. The regularization of SAT made on data point x is The prediction of augmented data point x
  • 33. Method | about IMSAT ● Regularization of deep neural networks via SAT 33 The regularization by SAT is then the average of over all the training data points: The augmented function T means adding small perturbation r and can be expressed by the following expression:
  • 34. Method | about IMSAT ● Regularization of deep neural networks via SAT 34 The two representative regularization methods based on local perturbations ● Random Perturbation Training (RPT) [Bachman+ 2016] ● Virtual Adversarial Training (VAT) [Miyato+ 2016] In VAT, perturbation r is chosen to be an adversarial direction:
  • 35. Method | for Clustering 35 In clustering, we can directly apply the RIM. By representing mutual information as the difference between marginal entropy and conditional entropy [Cover & Thomas 2012], we have the objective to minimize: The two entropy terms can be calculated as
  • 36. Method | for Clustering 36 Here, h is the following entropy function: ● Increasing the marginal entropy H(Y) ○ Encourages the cluster sizes to be uniform ● Decreasing the conditional entropy H(Y|X) ○ Encourages unambiguous cluster assignments [Bridle+ 1991] In the previous research shows that we can incorporate our prior knowledge on cluster sizes by modifying H(Y) [Gomes+ 2010]
  • 37. Method | for Clustering 37 H(Y) can be rewritten as follows: Maximization of H(Y) is equivalent to minimization of KL, which encourages predicted cluster dist pθ(y) to be close U. Replaced U in KL with any specified class prior q(y) so that pθ(y) is encouraged to be close to q(y). We consider the following constrained optimization problem:
  • 38. Method | for Hash Learning 38 Considering the output space of the augmented data, this gives us Follows from the definition of interaction information and the conditional independence that
  • 39. Method | for Hash Learning 39 In hash learning, each data point is mapped into a D-bit binary code. So the original RIM is not directly applicable. The computation of mutual information of D-bit binary code is intractable for large D because it involves a summation over an exponential number of terms. [Brown 2009] shows that mutual information can be expanded as the sum of interaction information like:
  • 40. Method | for Hash Learning 40 In summary, our approximated objective to minimize is ● First term ○ Regularizes the neural network ● Second term ○ Maximizes the mutual information between data and each hash bit ● Third term ○ Removes the redundancy among the hash bits
  • 41. Method | Marginal Distribution 41 It is necessary to calculate the marginal distribution when computing mutual information. This is computationally done using the entire dataset, which is not suitable for using mini batch SGD. Therefore, we use an following approximation: In the case of clustering, the approximated objective that we actually minimize is an upper bound of the exact objective that we try to minimize.
  • 42. Contents ● Introduction ● Related work ● Method : IMSAT = IM + SAT ○ Information Maximization (IM) ○ Self-Augmented Training (SAT) ● Experiments ● Conclusion 42
  • 43. Contents ● Introduction ● Related work ● Method : IMSAT = IM + SAT ○ Information Maximization (IM) ○ Self-Augmented Training (SAT) ● Experiments ● Conclusion 43
  • 44. Experiments | Overview 44 ● About implements ● About clustering ● About hash learning
  • 45. Experiments | about implements 45 ● Clustering ○ Set the network dimensionality to d-1200-1200-M ○ Use Softmax as output layer ● Hash learning ○ Use smaller network sizes to ensure fast computation of mapping data info hash codes (will be shown later). ○ Use sigmoid as output layer ● Use Adam, ReLU, BatchNorm
  • 46. Experiments | clustering 46 ● About baseline models
  • 48. Experiments | clustering 48 ● About evaluation metric ○ Evaluate with Unsupervised clustering accuracy (ACC)
  • 49. Experiments | clustering 49 ● Experiment result
  • 50. Experiments | clustering 50 ● Experiment result
  • 51. Experiments | clustering 51 ● Experiment result
  • 52. Experiments | hash learning 52 ● About dataset ○ MNIST / CIFAR-10 ● About baseline models ○ Spectral hashing [Weiss+ 2009] ○ PCA-ITQ [Gong+ 2013] ○ Deep Hash [Erin Liong+ 2015] ○ Linear RIM / Deep RIM / IMSAT(VAT)
  • 53. Contents ● Introduction ● Related work ● Method : IMSAT = IM + SAT ○ Information Maximization (IM) ○ Self-Augmented Training (SAT) ● Experiments ● Conclusion 53
  • 54. Experiments | hash learning 54
  • 55. Experiments | hash learning 55 ● About evaluation metric ○ Mean Average Precision (mAP) ○ Precision at N = 500 samples ○ Hamming distance
  • 56. Contents ● Introduction ● Related work ● Method : IMSAT = IM + SAT ○ Information Maximization (IM) ○ Self-Augmented Training (SAT) ● Experiments ● Conclusion 56
  • 57. Conclusion | IMSAT 57 ● Proposed “IMSAT” ○ Information theoretic method for unsupervised discrete representation learning using deep neural networks ● Directly introduce invariance to data augmentation in an end-to-end fashion ○ Learn robust discrete representations for small perturbations and affine transformations