eep Learning via Semi-Supervised
Embedding
Summarized by
Shohei Ohsawa
The University of Tokyo
ohsawa@weblab.t.u-tokyo.ac....
Summary
• Deep Learning via Semi-Supervised Embedding
• Jason Weston (NEC Labs America, USA)
• Frédéric Ratle (IGAR, Unive...
ADGENDA
2
1. Introduction
2. Semi-supervised Embedding
3. Semi-supervised Embedding for Deep Learning
4. Existing Approach...
ADGENDA
3
1. Introduction
2. Semi-supervised Embedding
3. Semi-supervised Embedding for Deep Learning
4. Existing Approach...
1. Introduction
Background
• Embedding data into a lower dimensional space are unsupervised dimensionality reduction
techn...
1. Introduction
Semi-supervised Learning
• Recently, the field of semi-supervised learning [Chapelle 2006], which has the g...
1. Introduction
Issue of Existing Architectures
• Most of these architectures are disjoint and shallow
• Joint Methods
• T...
1. Introduction
Semi-supervised Learning
• Deep architectures seem a natural choice in hard AI tasks which involve several...
1. Introduction
Deep Architecture for Semi-supervised Learning
• Several authors have recently proposed methods for using ...
1. Introduction
Semi-supervised Learning
• The aim is that the unsupervised method will improve accuracy on the task at ha...
1. Introduction
Objective
• In this presentation, we advocate simpler ways of performing deep learning by leveraging
exist...
ADGENDA
12
1. Introduction
2. Semi-supervised Embedding
3. Semi-supervised Embedding for Deep Learning
4. Existing Approac...
2. Semi-supervised Embedding
Structure Assumption
• Structure Assumption: points within the same structure (such as a clus...
2. Semi-supervised Embedding
Embedding Algorithms
14
minimize
s.t. Balancing Constant
2. Semi-supervised Embedding
Embedding Algorithms: Multidimensional Scaling (MDS)
• A classical algorithm that attempts to...
2. Semi-supervised Embedding
Embedding Algorithms: ISOMAP [Tenenbaum 2000]
16
W=1/8
2. Semi-supervised Embedding
Embedding Algorithms: Laplacian Eigenmaps [Belkin & Niyogi 2003]
17
Laplacian Kernel
2. Semi-supervised Embedding
Semi-superbised Algorithms
19
• L: Labelled Data
• U: Unlabelled Data
2. Semi-supervised Embedding
Semi-superbised Algorithms: Label Propagation [Zhu & Ghahramani, 2002]
20
RegularizeLoss
2. Semi-supervised Embedding
Semi-superbised Algorithms: LapSVM [Belkin et al., 2006]
21
1
10
ADGENDA
22
1. Introduction
2. Semi-supervised Embedding
3. Semi-supervised Embedding for Deep Learning
4. Existing Approac...
3. Semi-supervised Embedding for Deep Learning
Overview
• We would like to use the ideas developed in semi-supervised lear...
3. Semi-supervised Embedding for Deep Learning
Overview
• Here, we describe a standard fully connected multi-layer network...
3. Semi-supervised Embedding for Deep Learning
Three Modes of Embedding in Deep Architectures
• The general method we prop...
3. Semi-supervised Embedding for Deep Learning
Three Modes of Embedding in Deep Architectures: (a) Output
• Add a semi-sup...
3. Semi-supervised Embedding for Deep Learning
Three Modes of Embedding in Deep Architectures: (b) Internal
• Regularize t...
3. Semi-supervised Embedding for Deep Learning
Three Modes of Embedding in Deep Architectures: (c) Auxiliary
• Create an a...
3. Semi-supervised Embedding for Deep Learning
Algorithm
30
3. Semi-supervised Embedding for Deep Learning
Labeling unlabeled data as Neighbors
31
ADGENDA
36
1. Introduction
2. Semi-supervised Embedding
3. Semi-supervised Embedding for Deep Learning
4. Existing Approac...
Deep Boltzmann Machine
• ボルツマンマシンの一種
• RBM を多段に重ねたような形
37
中間層II
中間層I
入力層
ノード値 バイアス
エネルギー関数
Auto-encoder
• Auto-encoder framework [Lecaum 1987][Bourland 1988][Hinton 1994] : unsupervised feature
construction method...
ADGENDA
39
1. Introduction
2. Semi-supervised Embedding
3. Semi-supervised Embedding for Deep Learning
4. Existing Approac...
5. Experimental Result
43
ADGENDA
47
1. Introduction
2. Semi-supervised Embedding
3. Semi-supervised Embedding for Deep Learning
4. Existing Approac...
6. Conclusion
• In this work, we showed how one can improve supervised learning for deep architectures if
one jointly lear...
研究内容: Like Prediction
49
研究内容: Like Prediction
50
ディープ・アーキテクチャを用いたグラフの階層的クラスタリング
51
オートエンコーダ
グラフのクラスタ(コミュニ
ティ)
グラフ
DISCUSSION
52
Upcoming SlideShare
Loading in …5
×

Deep Learning via Semi-Supervised Embedding (第 7 回 Deep Learning 勉強会資料; 大澤)

2,611 views

Published on

Deep Learning Japan @ 東大です
http://www.facebook.com/DeepLearning
https://sites.google.com/site/deeplearning2013/

Published in: Technology, Education
0 Comments
10 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
2,611
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
0
Comments
0
Likes
10
Embeds 0
No embeds

No notes for slide

Deep Learning via Semi-Supervised Embedding (第 7 回 Deep Learning 勉強会資料; 大澤)

  1. 1. eep Learning via Semi-Supervised Embedding Summarized by Shohei Ohsawa The University of Tokyo ohsawa@weblab.t.u-tokyo.ac.jp D
  2. 2. Summary • Deep Learning via Semi-Supervised Embedding • Jason Weston (NEC Labs America, USA) • Frédéric Ratle (IGAR, University of Lausanne, Switzerland) • Ronan Collobert (NEC Labs America, USA) • Proceedings of the 25th International Conference on Machine Learning (ICML2008) • Enhancing semi-supervised learning to support deep architecture • Citations: 88 1 Title Author Contents Venue Information
  3. 3. ADGENDA 2 1. Introduction 2. Semi-supervised Embedding 3. Semi-supervised Embedding for Deep Learning 4. Existing Approaches to Deep Learning 5. Experimental Result 6. Conclusion
  4. 4. ADGENDA 3 1. Introduction 2. Semi-supervised Embedding 3. Semi-supervised Embedding for Deep Learning 4. Existing Approaches to Deep Learning 5. Experimental Result 6. Conclusion
  5. 5. 1. Introduction Background • Embedding data into a lower dimensional space are unsupervised dimensionality reduction techniques that have been intensively studied. • Most algorithms are developed with the motivation of producing a useful analysis and visualization tool. 4 Unlabelled data Manifold Embedding
  6. 6. 1. Introduction Semi-supervised Learning • Recently, the field of semi-supervised learning [Chapelle 2006], which has the goal of improving generalization on supervised tasks using unlabeled data, has made use of many of these techniques. • Ex.) researchers have used nonlinear embedding or cluster representations (unsupervised) as features for a supervised classifier, with improved results. 5 Labelled data Unlabelled data
  7. 7. 1. Introduction Issue of Existing Architectures • Most of these architectures are disjoint and shallow • Joint Methods • Transductive Support Vector Machines (TSVMs) [Vapnik 1998] • LapSVM [Belkin 2006] •   their architecture is still shallow. 6 The unsupervised dimensionality reduction algorithm is trained on unlabeled data separately as a first step a supervised classifier which has a shallow architecture such as a (kernelized) linear model. • [Chapelle 2003][Chapelle 2005] learn a clustering or a distance measure based on a nonlinear manifold embedding as a first step
  8. 8. 1. Introduction Semi-supervised Learning • Deep architectures seem a natural choice in hard AI tasks which involve several sub-tasks which can be coded into the layers of the architecture. • As argued by several researchers [Hinton 2006][Bengio 2007] semi-supervised learning is also natural in such a setting as otherwise one is not likely to ever have enough labeled data to perform well. 7
  9. 9. 1. Introduction Deep Architecture for Semi-supervised Learning • Several authors have recently proposed methods for using unlabeled data in deep neural network-based architectures. • These methods either perform a greedy layer-wise pre-training of weights using unlabeled data alone followed by supervised fine-tuning (which can be compared to the disjoint shallow techniques for semi-supervised learning described before), • Or learn unsupervised encodings at multiple levels of the architecture jointly with a supervised signal. The basic setup 8 1. Choose an unsupervised learning algorithm. 2. Choose a model with a deep architecture. 3. The unsupervised learning is plugged into any layers of the architecture 4. Train supervised and unsupervised tasks using the same architecture simultaneously
  10. 10. 1. Introduction Semi-supervised Learning • The aim is that the unsupervised method will improve accuracy on the task at hand. • However, the unsupervised methods so far proposed for deep architectures are in our opinion somewhat complicated and restricted. • They include: • generative model (a restricted Boltzmann machine) [Hinton 2006] • autoencoders [Bengio 2007] • sparse encoding [Ranzato 2007] • Moreover, in all cases these methods are not compared with, and appear on the surface to be completely different to, algorithms developed by researchers in the field of semi- supervised learning. 9
  11. 11. 1. Introduction Objective • In this presentation, we advocate simpler ways of performing deep learning by leveraging existing ideas from semi-supervised algorithms so far developed in shallow architectures. • In particular, we focus on the idea of combining an embedding-based regularizer with a supervised learner to perform semi-supervised learning • Laplacian SVMs [Belkin et al.,2006] • We show that this method can be: • (i) generalized to multi-layer networks and trained by stochastic gradient descent • (ii) is valid in the deep learning framework given above. 10
  12. 12. ADGENDA 12 1. Introduction 2. Semi-supervised Embedding 3. Semi-supervised Embedding for Deep Learning 4. Existing Approaches to Deep Learning 5. Experimental Result 6. Conclusion
  13. 13. 2. Semi-supervised Embedding Structure Assumption • Structure Assumption: points within the same structure (such as a cluster or a manifold) are likely to have the same label. • Algorithms • cluster kernels [Chapelle et al., 2003] • LDS [Chapelle & Zien,2005] • label propagation [Zhu & Ghahramani, 2002] • LapSVM [Belkin et al., 2006] • To understand these methods we will first review some relevant approaches to linear and nonlinear embedding. 13 Labelled data The labels can be estimated as soon as the points on a same manifold
  14. 14. 2. Semi-supervised Embedding Embedding Algorithms 14 minimize s.t. Balancing Constant
  15. 15. 2. Semi-supervised Embedding Embedding Algorithms: Multidimensional Scaling (MDS) • A classical algorithm that attempts to preserve the distance between points, whilst embedding them in a lower dimensional space • MDS is equivalent to PCA if the metric is Euclidean [Williams, 2001] 15 Manifold
  16. 16. 2. Semi-supervised Embedding Embedding Algorithms: ISOMAP [Tenenbaum 2000] 16 W=1/8
  17. 17. 2. Semi-supervised Embedding Embedding Algorithms: Laplacian Eigenmaps [Belkin & Niyogi 2003] 17 Laplacian Kernel
  18. 18. 2. Semi-supervised Embedding Semi-superbised Algorithms 19 • L: Labelled Data • U: Unlabelled Data
  19. 19. 2. Semi-supervised Embedding Semi-superbised Algorithms: Label Propagation [Zhu & Ghahramani, 2002] 20 RegularizeLoss
  20. 20. 2. Semi-supervised Embedding Semi-superbised Algorithms: LapSVM [Belkin et al., 2006] 21 1 10
  21. 21. ADGENDA 22 1. Introduction 2. Semi-supervised Embedding 3. Semi-supervised Embedding for Deep Learning 4. Existing Approaches to Deep Learning 5. Experimental Result 6. Conclusion
  22. 22. 3. Semi-supervised Embedding for Deep Learning Overview • We would like to use the ideas developed in semi-supervised learning for deep learning. Deep learning consists of learning a model with several layers of non-linear mapping. • We will consider multi-layer networks with M layers of hidden units that give a C- dimensional output vector: • wO: the weights for the output layer • typically the kth layer is defined as • S: a non-linear squashing function such as tanh. 23
  23. 23. 3. Semi-supervised Embedding for Deep Learning Overview • Here, we describe a standard fully connected multi-layer network but prior knowledge about a particular problem could lead one to other network designs. • For example in sequence and image recognition time delay and convolutional networks (TDNNs and CNNs) [Le-Cun et al., 1998] have been very successful. • In those approaches one introduces layers that apply convolutions on their input which take into account locality information in the data, i.e. they learn features from image patches or windows within a sequence. 24
  24. 24. 3. Semi-supervised Embedding for Deep Learning Three Modes of Embedding in Deep Architectures • The general method we propose for semi-supervised deep learning is to add a semi- supervised regularizer in deep architectures in one of three different modes 25 (a) Output (b) Internal (c) Auxiliary
  25. 25. 3. Semi-supervised Embedding for Deep Learning Three Modes of Embedding in Deep Architectures: (a) Output • Add a semi-supervised loss (regularizer) to the supervised loss on the entire network’s output • This is most similar to the shallow techniques described before 26
  26. 26. 3. Semi-supervised Embedding for Deep Learning Three Modes of Embedding in Deep Architectures: (b) Internal • Regularize the k th hidden layer (7) directly: 27 • is the output of the network up to the hidden layer.
  27. 27. 3. Semi-supervised Embedding for Deep Learning Three Modes of Embedding in Deep Architectures: (c) Auxiliary • Create an auxiliary network which shares the first k layers of the original network but has a new final set of weights: • We train this network to embed unlabeled data simultaneously as we train the original network on labeled data. 28
  28. 28. 3. Semi-supervised Embedding for Deep Learning Algorithm 30
  29. 29. 3. Semi-supervised Embedding for Deep Learning Labeling unlabeled data as Neighbors 31
  30. 30. ADGENDA 36 1. Introduction 2. Semi-supervised Embedding 3. Semi-supervised Embedding for Deep Learning 4. Existing Approaches to Deep Learning 5. Experimental Result 6. Conclusion
  31. 31. Deep Boltzmann Machine • ボルツマンマシンの一種 • RBM を多段に重ねたような形 37 中間層II 中間層I 入力層 ノード値 バイアス エネルギー関数
  32. 32. Auto-encoder • Auto-encoder framework [Lecaum 1987][Bourland 1988][Hinton 1994] : unsupervised feature construction method の一つ。 • auto-: 「自己の」 auto-encoder を直訳すると自己符号器 • encoder, decoder, reconstruction error の 3 つの要素から構成。 • encoder と decoder の合成写像が入力値を再現するような学習を行う。 • 学習は入力値と出力値の誤差(reconstruction error)を最小化することで行われる。 • この操作によって、入力値をより適切な表現に写像する auto-encoder が得られる。 38 (Auto-)encoder Decoder Reconstruction Representation Vector t-th Input Vector Output Vector Reconstruction Error
  33. 33. ADGENDA 39 1. Introduction 2. Semi-supervised Embedding 3. Semi-supervised Embedding for Deep Learning 4. Existing Approaches to Deep Learning 5. Experimental Result 6. Conclusion
  34. 34. 5. Experimental Result 43
  35. 35. ADGENDA 47 1. Introduction 2. Semi-supervised Embedding 3. Semi-supervised Embedding for Deep Learning 4. Existing Approaches to Deep Learning 5. Experimental Result 6. Conclusion
  36. 36. 6. Conclusion • In this work, we showed how one can improve supervised learning for deep architectures if one jointly learns an embedding task using unlabeled data. • Our results both confirm previous findings and generalize them. • Researchers using shallow architectures already showed two ways of using embedding to improve generalization • (i) embedding unlabeled data as a separate pre-processing step (i.e., first layer training) • (ii) using embedding as a regularizer (i.e., at the output layer). • More importantly, we generalized these approaches to the case where we train a semi- supervised embedding jointly with a supervised deep multi-layer architecture on any (or all) layers of the network 48
  37. 37. 研究内容: Like Prediction 49
  38. 38. 研究内容: Like Prediction 50
  39. 39. ディープ・アーキテクチャを用いたグラフの階層的クラスタリング 51 オートエンコーダ グラフのクラスタ(コミュニ ティ) グラフ
  40. 40. DISCUSSION 52

×