SlideShare a Scribd company logo
eep Learning via Semi-Supervised
Embedding
Summarized by
Shohei Ohsawa
The University of Tokyo
ohsawa@weblab.t.u-tokyo.ac.jp
D
Summary
• Deep Learning via Semi-Supervised Embedding
• Jason Weston (NEC Labs America, USA)
• Frédéric Ratle (IGAR, University of Lausanne, Switzerland)
• Ronan Collobert (NEC Labs America, USA)
• Proceedings of the 25th International Conference on Machine Learning
(ICML2008)
• Enhancing semi-supervised learning to support deep architecture
• Citations: 88
1
Title
Author
Contents
Venue
Information
ADGENDA
2
1. Introduction
2. Semi-supervised Embedding
3. Semi-supervised Embedding for Deep Learning
4. Existing Approaches to Deep Learning
5. Experimental Result
6. Conclusion
ADGENDA
3
1. Introduction
2. Semi-supervised Embedding
3. Semi-supervised Embedding for Deep Learning
4. Existing Approaches to Deep Learning
5. Experimental Result
6. Conclusion
1. Introduction
Background
• Embedding data into a lower dimensional space are unsupervised dimensionality reduction
techniques that have been intensively studied.
• Most algorithms are developed with the motivation of producing a useful analysis and
visualization tool.
4
Unlabelled data
Manifold Embedding
1. Introduction
Semi-supervised Learning
• Recently, the field of semi-supervised learning [Chapelle 2006], which has the goal of
improving generalization on supervised tasks using unlabeled data, has made use of many
of these techniques.
• Ex.) researchers have used nonlinear embedding or cluster representations (unsupervised)
as features for a supervised classifier, with improved results.
5
Labelled data
Unlabelled data
1. Introduction
Issue of Existing Architectures
• Most of these architectures are disjoint and shallow
• Joint Methods
• Transductive Support Vector Machines (TSVMs) [Vapnik 1998]
• LapSVM [Belkin 2006]
•   their architecture is still shallow.
6
The unsupervised dimensionality reduction
algorithm is trained on unlabeled data separately
as a first step
a supervised classifier which has a
shallow architecture such as a
(kernelized) linear model.
• [Chapelle 2003][Chapelle 2005] learn a
clustering or a distance measure based on a
nonlinear manifold embedding as a first step
1. Introduction
Semi-supervised Learning
• Deep architectures seem a natural choice in hard AI tasks which involve several sub-tasks
which can be coded into the layers of the architecture.
• As argued by several researchers [Hinton 2006][Bengio 2007] semi-supervised learning is
also natural in such a setting as otherwise one is not likely to ever have enough labeled
data to perform well.
7
1. Introduction
Deep Architecture for Semi-supervised Learning
• Several authors have recently proposed methods for using unlabeled data in deep neural
network-based architectures.
• These methods either perform a greedy layer-wise pre-training of weights using unlabeled
data alone followed by supervised fine-tuning (which can be compared to the disjoint
shallow techniques for semi-supervised learning described before),
• Or learn unsupervised encodings at multiple levels of the architecture jointly with a
supervised signal.
The basic setup
8
1. Choose an unsupervised
learning algorithm.
2. Choose a model
with a deep
architecture.
3. The unsupervised
learning is plugged
into any layers of the
architecture
4. Train supervised
and unsupervised
tasks using the
same architecture
simultaneously
1. Introduction
Semi-supervised Learning
• The aim is that the unsupervised method will improve accuracy on the task at hand.
• However, the unsupervised methods so far proposed for deep architectures are in our
opinion somewhat complicated and restricted.
• They include:
• generative model (a restricted Boltzmann machine) [Hinton 2006]
• autoencoders [Bengio 2007]
• sparse encoding [Ranzato 2007]
• Moreover, in all cases these methods are not compared with, and appear on the surface to
be completely different to, algorithms developed by researchers in the field of semi-
supervised learning.
9
1. Introduction
Objective
• In this presentation, we advocate simpler ways of performing deep learning by leveraging
existing ideas from semi-supervised algorithms so far developed in shallow architectures.
• In particular, we focus on the idea of combining an embedding-based regularizer with a
supervised learner to perform semi-supervised learning
• Laplacian SVMs [Belkin et al.,2006]
• We show that this method can be:
• (i) generalized to multi-layer networks and trained by stochastic gradient descent
• (ii) is valid in the deep learning framework given above.
10
ADGENDA
12
1. Introduction
2. Semi-supervised Embedding
3. Semi-supervised Embedding for Deep Learning
4. Existing Approaches to Deep Learning
5. Experimental Result
6. Conclusion
2. Semi-supervised Embedding
Structure Assumption
• Structure Assumption: points within the same structure (such as a cluster or a manifold)
are likely to have the same label.
• Algorithms
• cluster kernels [Chapelle et al., 2003]
• LDS [Chapelle & Zien,2005]
• label propagation [Zhu & Ghahramani, 2002]
• LapSVM [Belkin et al., 2006]
• To understand these methods we will first review some relevant approaches to linear and
nonlinear embedding.
13
Labelled data
The labels can be estimated
as soon as the points on a
same manifold
2. Semi-supervised Embedding
Embedding Algorithms
14
minimize
s.t. Balancing Constant
2. Semi-supervised Embedding
Embedding Algorithms: Multidimensional Scaling (MDS)
• A classical algorithm that attempts to preserve the distance between points, whilst
embedding them in a lower dimensional space
• MDS is equivalent to PCA if the metric is Euclidean [Williams, 2001]
15
Manifold
2. Semi-supervised Embedding
Embedding Algorithms: ISOMAP [Tenenbaum 2000]
16
W=1/8
2. Semi-supervised Embedding
Embedding Algorithms: Laplacian Eigenmaps [Belkin & Niyogi 2003]
17
Laplacian Kernel
2. Semi-supervised Embedding
Semi-superbised Algorithms
19
• L: Labelled Data
• U: Unlabelled Data
2. Semi-supervised Embedding
Semi-superbised Algorithms: Label Propagation [Zhu & Ghahramani, 2002]
20
RegularizeLoss
2. Semi-supervised Embedding
Semi-superbised Algorithms: LapSVM [Belkin et al., 2006]
21
1
10
ADGENDA
22
1. Introduction
2. Semi-supervised Embedding
3. Semi-supervised Embedding for Deep Learning
4. Existing Approaches to Deep Learning
5. Experimental Result
6. Conclusion
3. Semi-supervised Embedding for Deep Learning
Overview
• We would like to use the ideas developed in semi-supervised learning for deep learning.
Deep learning consists of learning a model with several layers of non-linear mapping.
• We will consider multi-layer networks with M layers of hidden units that give a C-
dimensional output vector:
• wO: the weights for the output layer
• typically the kth layer is defined as
• S: a non-linear squashing function such as tanh.
23
3. Semi-supervised Embedding for Deep Learning
Overview
• Here, we describe a standard fully connected multi-layer network but prior knowledge about
a particular problem could lead one to other network designs.
• For example in sequence and image recognition time delay and convolutional networks
(TDNNs and CNNs) [Le-Cun et al., 1998] have been very successful.
• In those approaches one introduces layers that apply convolutions on their input which
take into account locality information in the data, i.e. they learn features from image
patches or windows within a sequence.
24
3. Semi-supervised Embedding for Deep Learning
Three Modes of Embedding in Deep Architectures
• The general method we propose for semi-supervised deep learning is to add a semi-
supervised regularizer in deep architectures in one of three different modes
25
(a) Output (b) Internal (c) Auxiliary
3. Semi-supervised Embedding for Deep Learning
Three Modes of Embedding in Deep Architectures: (a) Output
• Add a semi-supervised loss (regularizer) to the supervised loss on the entire network’s
output
• This is most similar to the shallow techniques described before
26
3. Semi-supervised Embedding for Deep Learning
Three Modes of Embedding in Deep Architectures: (b) Internal
• Regularize the k th hidden layer (7) directly:
27
• is the output of the
network up to the hidden layer.
3. Semi-supervised Embedding for Deep Learning
Three Modes of Embedding in Deep Architectures: (c) Auxiliary
• Create an auxiliary network which shares the first k layers of the original network but has a
new final set of weights:
• We train this network to embed unlabeled data simultaneously as we train the original
network on labeled data.
28
3. Semi-supervised Embedding for Deep Learning
Algorithm
30
3. Semi-supervised Embedding for Deep Learning
Labeling unlabeled data as Neighbors
31
ADGENDA
36
1. Introduction
2. Semi-supervised Embedding
3. Semi-supervised Embedding for Deep Learning
4. Existing Approaches to Deep Learning
5. Experimental Result
6. Conclusion
Deep Boltzmann Machine
• ボルツマンマシンの一種
• RBM を多段に重ねたような形
37
中間層II
中間層I
入力層
ノード値 バイアス
エネルギー関数
Auto-encoder
• Auto-encoder framework [Lecaum 1987][Bourland 1988][Hinton 1994] : unsupervised feature
construction method の一つ。
• auto-: 「自己の」 auto-encoder を直訳すると自己符号器
• encoder, decoder, reconstruction error の 3 つの要素から構成。
• encoder と decoder の合成写像が入力値を再現するような学習を行う。
• 学習は入力値と出力値の誤差(reconstruction error)を最小化することで行われる。
• この操作によって、入力値をより適切な表現に写像する auto-encoder が得られる。
38
(Auto-)encoder Decoder
Reconstruction
Representation Vector
t-th Input
Vector
Output
Vector
Reconstruction Error
ADGENDA
39
1. Introduction
2. Semi-supervised Embedding
3. Semi-supervised Embedding for Deep Learning
4. Existing Approaches to Deep Learning
5. Experimental Result
6. Conclusion
5. Experimental Result
43
ADGENDA
47
1. Introduction
2. Semi-supervised Embedding
3. Semi-supervised Embedding for Deep Learning
4. Existing Approaches to Deep Learning
5. Experimental Result
6. Conclusion
6. Conclusion
• In this work, we showed how one can improve supervised learning for deep architectures if
one jointly learns an embedding task using unlabeled data.
• Our results both confirm previous findings and generalize them.
• Researchers using shallow architectures already showed two ways of using embedding to
improve generalization
• (i) embedding unlabeled data as a separate pre-processing step
(i.e., first layer training)
• (ii) using embedding as a regularizer (i.e., at the output layer).
• More importantly, we generalized these approaches to the case where we train a semi-
supervised embedding jointly with a supervised deep multi-layer architecture on any (or all)
layers of the network
48
研究内容: Like Prediction
49
研究内容: Like Prediction
50
ディープ・アーキテクチャを用いたグラフの階層的クラスタリング
51
オートエンコーダ
グラフのクラスタ(コミュニ
ティ)
グラフ
DISCUSSION
52

More Related Content

What's hot

Deep belief network.pptx
Deep belief network.pptxDeep belief network.pptx
Deep belief network.pptxSushilAcharya18
 
PR-231: A Simple Framework for Contrastive Learning of Visual Representations
PR-231: A Simple Framework for Contrastive Learning of Visual RepresentationsPR-231: A Simple Framework for Contrastive Learning of Visual Representations
PR-231: A Simple Framework for Contrastive Learning of Visual RepresentationsJinwon Lee
 
Security issues and attacks in wireless sensor networks
Security issues and attacks in wireless sensor networksSecurity issues and attacks in wireless sensor networks
Security issues and attacks in wireless sensor networksMd Waresul Islam
 
Lifelong Learning for Dynamically Expandable Networks
Lifelong Learning for Dynamically Expandable NetworksLifelong Learning for Dynamically Expandable Networks
Lifelong Learning for Dynamically Expandable NetworksNAVER Engineering
 
Overfitting & Underfitting
Overfitting & UnderfittingOverfitting & Underfitting
Overfitting & UnderfittingSOUMIT KAR
 
Vanishing & Exploding Gradients
Vanishing & Exploding GradientsVanishing & Exploding Gradients
Vanishing & Exploding GradientsSiddharth Vij
 
Transfer Learning and Fine Tuning for Cross Domain Image Classification with ...
Transfer Learning and Fine Tuning for Cross Domain Image Classification with ...Transfer Learning and Fine Tuning for Cross Domain Image Classification with ...
Transfer Learning and Fine Tuning for Cross Domain Image Classification with ...Sujit Pal
 
Deep Belief nets
Deep Belief netsDeep Belief nets
Deep Belief netsbutest
 
Clustering - Machine Learning Techniques
Clustering - Machine Learning TechniquesClustering - Machine Learning Techniques
Clustering - Machine Learning TechniquesKush Kulshrestha
 
Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)Gaurav Mittal
 
GANs Presentation.pptx
GANs Presentation.pptxGANs Presentation.pptx
GANs Presentation.pptxMAHMOUD729246
 
Image classification using convolutional neural network
Image classification using convolutional neural networkImage classification using convolutional neural network
Image classification using convolutional neural networkKIRAN R
 

What's hot (20)

Deep belief network.pptx
Deep belief network.pptxDeep belief network.pptx
Deep belief network.pptx
 
Active learning
Active learningActive learning
Active learning
 
PR-231: A Simple Framework for Contrastive Learning of Visual Representations
PR-231: A Simple Framework for Contrastive Learning of Visual RepresentationsPR-231: A Simple Framework for Contrastive Learning of Visual Representations
PR-231: A Simple Framework for Contrastive Learning of Visual Representations
 
Adaptive huffman coding
Adaptive huffman codingAdaptive huffman coding
Adaptive huffman coding
 
LSTM Tutorial
LSTM TutorialLSTM Tutorial
LSTM Tutorial
 
Recurrent neural network
Recurrent neural networkRecurrent neural network
Recurrent neural network
 
Security issues and attacks in wireless sensor networks
Security issues and attacks in wireless sensor networksSecurity issues and attacks in wireless sensor networks
Security issues and attacks in wireless sensor networks
 
Multimodal Deep Learning
Multimodal Deep LearningMultimodal Deep Learning
Multimodal Deep Learning
 
Lifelong Learning for Dynamically Expandable Networks
Lifelong Learning for Dynamically Expandable NetworksLifelong Learning for Dynamically Expandable Networks
Lifelong Learning for Dynamically Expandable Networks
 
Overfitting & Underfitting
Overfitting & UnderfittingOverfitting & Underfitting
Overfitting & Underfitting
 
Vanishing & Exploding Gradients
Vanishing & Exploding GradientsVanishing & Exploding Gradients
Vanishing & Exploding Gradients
 
Transfer Learning and Fine Tuning for Cross Domain Image Classification with ...
Transfer Learning and Fine Tuning for Cross Domain Image Classification with ...Transfer Learning and Fine Tuning for Cross Domain Image Classification with ...
Transfer Learning and Fine Tuning for Cross Domain Image Classification with ...
 
Deep Belief nets
Deep Belief netsDeep Belief nets
Deep Belief nets
 
Autoencoder
AutoencoderAutoencoder
Autoencoder
 
RNN-LSTM.pptx
RNN-LSTM.pptxRNN-LSTM.pptx
RNN-LSTM.pptx
 
Clustering - Machine Learning Techniques
Clustering - Machine Learning TechniquesClustering - Machine Learning Techniques
Clustering - Machine Learning Techniques
 
Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)
 
Chaptr 7 (final)
Chaptr 7 (final)Chaptr 7 (final)
Chaptr 7 (final)
 
GANs Presentation.pptx
GANs Presentation.pptxGANs Presentation.pptx
GANs Presentation.pptx
 
Image classification using convolutional neural network
Image classification using convolutional neural networkImage classification using convolutional neural network
Image classification using convolutional neural network
 

Viewers also liked

論文紹介 Semi-supervised Learning with Deep Generative Models
論文紹介 Semi-supervised Learning with Deep Generative Models論文紹介 Semi-supervised Learning with Deep Generative Models
論文紹介 Semi-supervised Learning with Deep Generative ModelsSeiya Tokui
 
Deep generative learning_icml_part1
Deep generative learning_icml_part1Deep generative learning_icml_part1
Deep generative learning_icml_part1Scyfer
 
Semi-supervised Facial Expressions Annotation Using Co-Training with Fast Pro...
Semi-supervised Facial Expressions Annotation Using Co-Training with Fast Pro...Semi-supervised Facial Expressions Annotation Using Co-Training with Fast Pro...
Semi-supervised Facial Expressions Annotation Using Co-Training with Fast Pro...Mohamed Farouk
 
Data Science: A Mindset for Productivity
Data Science: A Mindset for ProductivityData Science: A Mindset for Productivity
Data Science: A Mindset for ProductivityDaniel Tunkelang
 
Data.Mining.C.6(II).classification and prediction
Data.Mining.C.6(II).classification and predictionData.Mining.C.6(II).classification and prediction
Data.Mining.C.6(II).classification and predictionMargaret Wang
 
DL Hacks輪読 Semi-supervised Learning with Deep Generative Models
DL Hacks輪読 Semi-supervised Learning with Deep Generative ModelsDL Hacks輪読 Semi-supervised Learning with Deep Generative Models
DL Hacks輪読 Semi-supervised Learning with Deep Generative ModelsYusuke Iwasawa
 
NIPS2015読み会: Ladder Networks
NIPS2015読み会: Ladder NetworksNIPS2015読み会: Ladder Networks
NIPS2015読み会: Ladder NetworksEiichi Matsumoto
 
Uncertainty Awareness in Integrating Machine Learning and Game Theory
Uncertainty Awareness in Integrating Machine Learning and Game TheoryUncertainty Awareness in Integrating Machine Learning and Game Theory
Uncertainty Awareness in Integrating Machine Learning and Game TheoryRikiya Takahashi
 
3D Volumetric Data Generation with Generative Adversarial Networks
3D Volumetric Data Generation with Generative Adversarial Networks3D Volumetric Data Generation with Generative Adversarial Networks
3D Volumetric Data Generation with Generative Adversarial NetworksPreferred Networks
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine LearningRahul Jain
 
深層学習時代の自然言語処理
深層学習時代の自然言語処理深層学習時代の自然言語処理
深層学習時代の自然言語処理Yuya Unno
 

Viewers also liked (12)

論文紹介 Semi-supervised Learning with Deep Generative Models
論文紹介 Semi-supervised Learning with Deep Generative Models論文紹介 Semi-supervised Learning with Deep Generative Models
論文紹介 Semi-supervised Learning with Deep Generative Models
 
Deep generative learning_icml_part1
Deep generative learning_icml_part1Deep generative learning_icml_part1
Deep generative learning_icml_part1
 
Semi-supervised Facial Expressions Annotation Using Co-Training with Fast Pro...
Semi-supervised Facial Expressions Annotation Using Co-Training with Fast Pro...Semi-supervised Facial Expressions Annotation Using Co-Training with Fast Pro...
Semi-supervised Facial Expressions Annotation Using Co-Training with Fast Pro...
 
Data Science: A Mindset for Productivity
Data Science: A Mindset for ProductivityData Science: A Mindset for Productivity
Data Science: A Mindset for Productivity
 
Data.Mining.C.6(II).classification and prediction
Data.Mining.C.6(II).classification and predictionData.Mining.C.6(II).classification and prediction
Data.Mining.C.6(II).classification and prediction
 
DL Hacks輪読 Semi-supervised Learning with Deep Generative Models
DL Hacks輪読 Semi-supervised Learning with Deep Generative ModelsDL Hacks輪読 Semi-supervised Learning with Deep Generative Models
DL Hacks輪読 Semi-supervised Learning with Deep Generative Models
 
NIPS2015読み会: Ladder Networks
NIPS2015読み会: Ladder NetworksNIPS2015読み会: Ladder Networks
NIPS2015読み会: Ladder Networks
 
Uncertainty Awareness in Integrating Machine Learning and Game Theory
Uncertainty Awareness in Integrating Machine Learning and Game TheoryUncertainty Awareness in Integrating Machine Learning and Game Theory
Uncertainty Awareness in Integrating Machine Learning and Game Theory
 
3D Volumetric Data Generation with Generative Adversarial Networks
3D Volumetric Data Generation with Generative Adversarial Networks3D Volumetric Data Generation with Generative Adversarial Networks
3D Volumetric Data Generation with Generative Adversarial Networks
 
Machine Learning for Dummies
Machine Learning for DummiesMachine Learning for Dummies
Machine Learning for Dummies
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
深層学習時代の自然言語処理
深層学習時代の自然言語処理深層学習時代の自然言語処理
深層学習時代の自然言語処理
 

Similar to Deep Learning via Semi-Supervised Embedding (第 7 回 Deep Learning 勉強会資料; 大澤)

Fundamental of deep learning
Fundamental of deep learningFundamental of deep learning
Fundamental of deep learningStanley Wang
 
An Introduction to Deep Learning
An Introduction to Deep LearningAn Introduction to Deep Learning
An Introduction to Deep LearningPoo Kuan Hoong
 
Big Data Malaysia - A Primer on Deep Learning
Big Data Malaysia - A Primer on Deep LearningBig Data Malaysia - A Primer on Deep Learning
Big Data Malaysia - A Primer on Deep LearningPoo Kuan Hoong
 
Towards better analysis of deep convolutional neural networks
Towards better analysis of deep convolutional neural networksTowards better analysis of deep convolutional neural networks
Towards better analysis of deep convolutional neural networks曾 子芸
 
character_ANN.ppt
character_ANN.pptcharacter_ANN.ppt
character_ANN.pptHarsh480253
 
Chapter10.pptx
Chapter10.pptxChapter10.pptx
Chapter10.pptxadnansbp
 
Sujit Pal - Applying the four-step "Embed, Encode, Attend, Predict" framework...
Sujit Pal - Applying the four-step "Embed, Encode, Attend, Predict" framework...Sujit Pal - Applying the four-step "Embed, Encode, Attend, Predict" framework...
Sujit Pal - Applying the four-step "Embed, Encode, Attend, Predict" framework...PyData
 
Deep learning summary
Deep learning summaryDeep learning summary
Deep learning summaryankit_ppt
 
DEF CON 24 - Clarence Chio - machine duping 101
DEF CON 24 - Clarence Chio - machine duping 101DEF CON 24 - Clarence Chio - machine duping 101
DEF CON 24 - Clarence Chio - machine duping 101Felipe Prado
 
Transfer Learning (20230516)
Transfer Learning (20230516)Transfer Learning (20230516)
Transfer Learning (20230516)FEG
 
ResNeSt: Split-Attention Networks
ResNeSt: Split-Attention NetworksResNeSt: Split-Attention Networks
ResNeSt: Split-Attention NetworksSeunghyun Hwang
 
Comparing Incremental Learning Strategies for Convolutional Neural Networks
Comparing Incremental Learning Strategies for Convolutional Neural NetworksComparing Incremental Learning Strategies for Convolutional Neural Networks
Comparing Incremental Learning Strategies for Convolutional Neural NetworksVincenzo Lomonaco
 
3_Transfer_Learning.pdf
3_Transfer_Learning.pdf3_Transfer_Learning.pdf
3_Transfer_Learning.pdfFEG
 
Exploring Strategies for Training Deep Neural Networks paper review
Exploring Strategies for Training Deep Neural Networks paper reviewExploring Strategies for Training Deep Neural Networks paper review
Exploring Strategies for Training Deep Neural Networks paper reviewVimukthi Wickramasinghe
 
MDEC Data Matters Series: machine learning and Deep Learning, A Primer
MDEC Data Matters Series: machine learning and Deep Learning, A PrimerMDEC Data Matters Series: machine learning and Deep Learning, A Primer
MDEC Data Matters Series: machine learning and Deep Learning, A PrimerPoo Kuan Hoong
 
Modern Convolutional Neural Network techniques for image segmentation
Modern Convolutional Neural Network techniques for image segmentationModern Convolutional Neural Network techniques for image segmentation
Modern Convolutional Neural Network techniques for image segmentationGioele Ciaparrone
 
DSRLab seminar Introduction to deep learning
DSRLab seminar   Introduction to deep learningDSRLab seminar   Introduction to deep learning
DSRLab seminar Introduction to deep learningPoo Kuan Hoong
 

Similar to Deep Learning via Semi-Supervised Embedding (第 7 回 Deep Learning 勉強会資料; 大澤) (20)

Network recasting
Network recastingNetwork recasting
Network recasting
 
Fundamental of deep learning
Fundamental of deep learningFundamental of deep learning
Fundamental of deep learning
 
An Introduction to Deep Learning
An Introduction to Deep LearningAn Introduction to Deep Learning
An Introduction to Deep Learning
 
Big Data Malaysia - A Primer on Deep Learning
Big Data Malaysia - A Primer on Deep LearningBig Data Malaysia - A Primer on Deep Learning
Big Data Malaysia - A Primer on Deep Learning
 
Introduction to deep learning
Introduction to deep learningIntroduction to deep learning
Introduction to deep learning
 
Towards better analysis of deep convolutional neural networks
Towards better analysis of deep convolutional neural networksTowards better analysis of deep convolutional neural networks
Towards better analysis of deep convolutional neural networks
 
character_ANN.ppt
character_ANN.pptcharacter_ANN.ppt
character_ANN.ppt
 
Dl
DlDl
Dl
 
Chapter10.pptx
Chapter10.pptxChapter10.pptx
Chapter10.pptx
 
Sujit Pal - Applying the four-step "Embed, Encode, Attend, Predict" framework...
Sujit Pal - Applying the four-step "Embed, Encode, Attend, Predict" framework...Sujit Pal - Applying the four-step "Embed, Encode, Attend, Predict" framework...
Sujit Pal - Applying the four-step "Embed, Encode, Attend, Predict" framework...
 
Deep learning summary
Deep learning summaryDeep learning summary
Deep learning summary
 
DEF CON 24 - Clarence Chio - machine duping 101
DEF CON 24 - Clarence Chio - machine duping 101DEF CON 24 - Clarence Chio - machine duping 101
DEF CON 24 - Clarence Chio - machine duping 101
 
Transfer Learning (20230516)
Transfer Learning (20230516)Transfer Learning (20230516)
Transfer Learning (20230516)
 
ResNeSt: Split-Attention Networks
ResNeSt: Split-Attention NetworksResNeSt: Split-Attention Networks
ResNeSt: Split-Attention Networks
 
Comparing Incremental Learning Strategies for Convolutional Neural Networks
Comparing Incremental Learning Strategies for Convolutional Neural NetworksComparing Incremental Learning Strategies for Convolutional Neural Networks
Comparing Incremental Learning Strategies for Convolutional Neural Networks
 
3_Transfer_Learning.pdf
3_Transfer_Learning.pdf3_Transfer_Learning.pdf
3_Transfer_Learning.pdf
 
Exploring Strategies for Training Deep Neural Networks paper review
Exploring Strategies for Training Deep Neural Networks paper reviewExploring Strategies for Training Deep Neural Networks paper review
Exploring Strategies for Training Deep Neural Networks paper review
 
MDEC Data Matters Series: machine learning and Deep Learning, A Primer
MDEC Data Matters Series: machine learning and Deep Learning, A PrimerMDEC Data Matters Series: machine learning and Deep Learning, A Primer
MDEC Data Matters Series: machine learning and Deep Learning, A Primer
 
Modern Convolutional Neural Network techniques for image segmentation
Modern Convolutional Neural Network techniques for image segmentationModern Convolutional Neural Network techniques for image segmentation
Modern Convolutional Neural Network techniques for image segmentation
 
DSRLab seminar Introduction to deep learning
DSRLab seminar   Introduction to deep learningDSRLab seminar   Introduction to deep learning
DSRLab seminar Introduction to deep learning
 

More from Ohsawa Goodfellow

Open-ended Learning in Symmetric Zero-sum Games @ ICML19
Open-ended Learning in Symmetric Zero-sum Games @ ICML19 Open-ended Learning in Symmetric Zero-sum Games @ ICML19
Open-ended Learning in Symmetric Zero-sum Games @ ICML19 Ohsawa Goodfellow
 
PRML上巻勉強会 at 東京大学 資料 第1章後半
PRML上巻勉強会 at 東京大学 資料 第1章後半PRML上巻勉強会 at 東京大学 資料 第1章後半
PRML上巻勉強会 at 東京大学 資料 第1章後半Ohsawa Goodfellow
 
PRML上巻勉強会 at 東京大学 資料 第1章前半
PRML上巻勉強会 at 東京大学 資料 第1章前半PRML上巻勉強会 at 東京大学 資料 第1章前半
PRML上巻勉強会 at 東京大学 資料 第1章前半Ohsawa Goodfellow
 
Natural Language Processing (Almost) from Scratch(第 6 回 Deep Learning 勉強会資料; 榊)
Natural Language Processing (Almost) from Scratch(第 6 回 Deep Learning 勉強会資料; 榊)Natural Language Processing (Almost) from Scratch(第 6 回 Deep Learning 勉強会資料; 榊)
Natural Language Processing (Almost) from Scratch(第 6 回 Deep Learning 勉強会資料; 榊)Ohsawa Goodfellow
 
Deep Auto-Encoder Neural Networks in Reiforcement Learnning (第 9 回 Deep Learn...
Deep Auto-Encoder Neural Networks in Reiforcement Learnning (第 9 回 Deep Learn...Deep Auto-Encoder Neural Networks in Reiforcement Learnning (第 9 回 Deep Learn...
Deep Auto-Encoder Neural Networks in Reiforcement Learnning (第 9 回 Deep Learn...Ohsawa Goodfellow
 
Semi-Supervised Autoencoders for Predicting Sentiment Distributions(第 5 回 De...
 Semi-Supervised Autoencoders for Predicting Sentiment Distributions(第 5 回 De... Semi-Supervised Autoencoders for Predicting Sentiment Distributions(第 5 回 De...
Semi-Supervised Autoencoders for Predicting Sentiment Distributions(第 5 回 De...Ohsawa Goodfellow
 
Learning Deep Architectures for AI (第 3 回 Deep Learning 勉強会資料; 松尾)
Learning Deep Architectures for AI (第 3 回 Deep Learning 勉強会資料; 松尾)Learning Deep Architectures for AI (第 3 回 Deep Learning 勉強会資料; 松尾)
Learning Deep Architectures for AI (第 3 回 Deep Learning 勉強会資料; 松尾)Ohsawa Goodfellow
 
Deep Learning 勉強会 (Chapter 7-12)
Deep Learning 勉強会 (Chapter 7-12)Deep Learning 勉強会 (Chapter 7-12)
Deep Learning 勉強会 (Chapter 7-12)Ohsawa Goodfellow
 
Deep learning勉強会20121214ochi
Deep learning勉強会20121214ochiDeep learning勉強会20121214ochi
Deep learning勉強会20121214ochiOhsawa Goodfellow
 
第9章 ネットワーク上の他の確率過程
第9章 ネットワーク上の他の確率過程第9章 ネットワーク上の他の確率過程
第9章 ネットワーク上の他の確率過程Ohsawa Goodfellow
 
XLWrapについてのご紹介
XLWrapについてのご紹介XLWrapについてのご紹介
XLWrapについてのご紹介Ohsawa Goodfellow
 
XLWrapについてのご紹介
XLWrapについてのご紹介XLWrapについてのご紹介
XLWrapについてのご紹介Ohsawa Goodfellow
 

More from Ohsawa Goodfellow (12)

Open-ended Learning in Symmetric Zero-sum Games @ ICML19
Open-ended Learning in Symmetric Zero-sum Games @ ICML19 Open-ended Learning in Symmetric Zero-sum Games @ ICML19
Open-ended Learning in Symmetric Zero-sum Games @ ICML19
 
PRML上巻勉強会 at 東京大学 資料 第1章後半
PRML上巻勉強会 at 東京大学 資料 第1章後半PRML上巻勉強会 at 東京大学 資料 第1章後半
PRML上巻勉強会 at 東京大学 資料 第1章後半
 
PRML上巻勉強会 at 東京大学 資料 第1章前半
PRML上巻勉強会 at 東京大学 資料 第1章前半PRML上巻勉強会 at 東京大学 資料 第1章前半
PRML上巻勉強会 at 東京大学 資料 第1章前半
 
Natural Language Processing (Almost) from Scratch(第 6 回 Deep Learning 勉強会資料; 榊)
Natural Language Processing (Almost) from Scratch(第 6 回 Deep Learning 勉強会資料; 榊)Natural Language Processing (Almost) from Scratch(第 6 回 Deep Learning 勉強会資料; 榊)
Natural Language Processing (Almost) from Scratch(第 6 回 Deep Learning 勉強会資料; 榊)
 
Deep Auto-Encoder Neural Networks in Reiforcement Learnning (第 9 回 Deep Learn...
Deep Auto-Encoder Neural Networks in Reiforcement Learnning (第 9 回 Deep Learn...Deep Auto-Encoder Neural Networks in Reiforcement Learnning (第 9 回 Deep Learn...
Deep Auto-Encoder Neural Networks in Reiforcement Learnning (第 9 回 Deep Learn...
 
Semi-Supervised Autoencoders for Predicting Sentiment Distributions(第 5 回 De...
 Semi-Supervised Autoencoders for Predicting Sentiment Distributions(第 5 回 De... Semi-Supervised Autoencoders for Predicting Sentiment Distributions(第 5 回 De...
Semi-Supervised Autoencoders for Predicting Sentiment Distributions(第 5 回 De...
 
Learning Deep Architectures for AI (第 3 回 Deep Learning 勉強会資料; 松尾)
Learning Deep Architectures for AI (第 3 回 Deep Learning 勉強会資料; 松尾)Learning Deep Architectures for AI (第 3 回 Deep Learning 勉強会資料; 松尾)
Learning Deep Architectures for AI (第 3 回 Deep Learning 勉強会資料; 松尾)
 
Deep Learning 勉強会 (Chapter 7-12)
Deep Learning 勉強会 (Chapter 7-12)Deep Learning 勉強会 (Chapter 7-12)
Deep Learning 勉強会 (Chapter 7-12)
 
Deep learning勉強会20121214ochi
Deep learning勉強会20121214ochiDeep learning勉強会20121214ochi
Deep learning勉強会20121214ochi
 
第9章 ネットワーク上の他の確率過程
第9章 ネットワーク上の他の確率過程第9章 ネットワーク上の他の確率過程
第9章 ネットワーク上の他の確率過程
 
XLWrapについてのご紹介
XLWrapについてのご紹介XLWrapについてのご紹介
XLWrapについてのご紹介
 
XLWrapについてのご紹介
XLWrapについてのご紹介XLWrapについてのご紹介
XLWrapについてのご紹介
 

Recently uploaded

How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...Product School
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...Product School
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...Sri Ambati
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesThousandEyes
 
IESVE for Early Stage Design and Planning
IESVE for Early Stage Design and PlanningIESVE for Early Stage Design and Planning
IESVE for Early Stage Design and PlanningIES VE
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
 
Speed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in MinutesSpeed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in Minutesconfluent
 
Introduction to Open Source RAG and RAG Evaluation
Introduction to Open Source RAG and RAG EvaluationIntroduction to Open Source RAG and RAG Evaluation
Introduction to Open Source RAG and RAG EvaluationZilliz
 
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...CzechDreamin
 
Search and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical FuturesSearch and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical FuturesBhaskar Mitra
 
Salesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone KomSalesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone KomCzechDreamin
 
In-Depth Performance Testing Guide for IT Professionals
In-Depth Performance Testing Guide for IT ProfessionalsIn-Depth Performance Testing Guide for IT Professionals
In-Depth Performance Testing Guide for IT ProfessionalsExpeed Software
 
AI revolution and Salesforce, Jiří Karpíšek
AI revolution and Salesforce, Jiří KarpíšekAI revolution and Salesforce, Jiří Karpíšek
AI revolution and Salesforce, Jiří KarpíšekCzechDreamin
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...Elena Simperl
 
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀DianaGray10
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor TurskyiFwdays
 
Powerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara LaskowskaPowerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara LaskowskaCzechDreamin
 
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlFuture Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlPeter Udo Diehl
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxAbida Shariff
 
10 Differences between Sales Cloud and CPQ, Blanka Doktorová
10 Differences between Sales Cloud and CPQ, Blanka Doktorová10 Differences between Sales Cloud and CPQ, Blanka Doktorová
10 Differences between Sales Cloud and CPQ, Blanka DoktorováCzechDreamin
 

Recently uploaded (20)

How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
IESVE for Early Stage Design and Planning
IESVE for Early Stage Design and PlanningIESVE for Early Stage Design and Planning
IESVE for Early Stage Design and Planning
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
Speed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in MinutesSpeed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in Minutes
 
Introduction to Open Source RAG and RAG Evaluation
Introduction to Open Source RAG and RAG EvaluationIntroduction to Open Source RAG and RAG Evaluation
Introduction to Open Source RAG and RAG Evaluation
 
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
 
Search and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical FuturesSearch and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical Futures
 
Salesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone KomSalesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
 
In-Depth Performance Testing Guide for IT Professionals
In-Depth Performance Testing Guide for IT ProfessionalsIn-Depth Performance Testing Guide for IT Professionals
In-Depth Performance Testing Guide for IT Professionals
 
AI revolution and Salesforce, Jiří Karpíšek
AI revolution and Salesforce, Jiří KarpíšekAI revolution and Salesforce, Jiří Karpíšek
AI revolution and Salesforce, Jiří Karpíšek
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
 
Powerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara LaskowskaPowerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara Laskowska
 
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlFuture Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
 
10 Differences between Sales Cloud and CPQ, Blanka Doktorová
10 Differences between Sales Cloud and CPQ, Blanka Doktorová10 Differences between Sales Cloud and CPQ, Blanka Doktorová
10 Differences between Sales Cloud and CPQ, Blanka Doktorová
 

Deep Learning via Semi-Supervised Embedding (第 7 回 Deep Learning 勉強会資料; 大澤)

  • 1. eep Learning via Semi-Supervised Embedding Summarized by Shohei Ohsawa The University of Tokyo ohsawa@weblab.t.u-tokyo.ac.jp D
  • 2. Summary • Deep Learning via Semi-Supervised Embedding • Jason Weston (NEC Labs America, USA) • Frédéric Ratle (IGAR, University of Lausanne, Switzerland) • Ronan Collobert (NEC Labs America, USA) • Proceedings of the 25th International Conference on Machine Learning (ICML2008) • Enhancing semi-supervised learning to support deep architecture • Citations: 88 1 Title Author Contents Venue Information
  • 3. ADGENDA 2 1. Introduction 2. Semi-supervised Embedding 3. Semi-supervised Embedding for Deep Learning 4. Existing Approaches to Deep Learning 5. Experimental Result 6. Conclusion
  • 4. ADGENDA 3 1. Introduction 2. Semi-supervised Embedding 3. Semi-supervised Embedding for Deep Learning 4. Existing Approaches to Deep Learning 5. Experimental Result 6. Conclusion
  • 5. 1. Introduction Background • Embedding data into a lower dimensional space are unsupervised dimensionality reduction techniques that have been intensively studied. • Most algorithms are developed with the motivation of producing a useful analysis and visualization tool. 4 Unlabelled data Manifold Embedding
  • 6. 1. Introduction Semi-supervised Learning • Recently, the field of semi-supervised learning [Chapelle 2006], which has the goal of improving generalization on supervised tasks using unlabeled data, has made use of many of these techniques. • Ex.) researchers have used nonlinear embedding or cluster representations (unsupervised) as features for a supervised classifier, with improved results. 5 Labelled data Unlabelled data
  • 7. 1. Introduction Issue of Existing Architectures • Most of these architectures are disjoint and shallow • Joint Methods • Transductive Support Vector Machines (TSVMs) [Vapnik 1998] • LapSVM [Belkin 2006] •   their architecture is still shallow. 6 The unsupervised dimensionality reduction algorithm is trained on unlabeled data separately as a first step a supervised classifier which has a shallow architecture such as a (kernelized) linear model. • [Chapelle 2003][Chapelle 2005] learn a clustering or a distance measure based on a nonlinear manifold embedding as a first step
  • 8. 1. Introduction Semi-supervised Learning • Deep architectures seem a natural choice in hard AI tasks which involve several sub-tasks which can be coded into the layers of the architecture. • As argued by several researchers [Hinton 2006][Bengio 2007] semi-supervised learning is also natural in such a setting as otherwise one is not likely to ever have enough labeled data to perform well. 7
  • 9. 1. Introduction Deep Architecture for Semi-supervised Learning • Several authors have recently proposed methods for using unlabeled data in deep neural network-based architectures. • These methods either perform a greedy layer-wise pre-training of weights using unlabeled data alone followed by supervised fine-tuning (which can be compared to the disjoint shallow techniques for semi-supervised learning described before), • Or learn unsupervised encodings at multiple levels of the architecture jointly with a supervised signal. The basic setup 8 1. Choose an unsupervised learning algorithm. 2. Choose a model with a deep architecture. 3. The unsupervised learning is plugged into any layers of the architecture 4. Train supervised and unsupervised tasks using the same architecture simultaneously
  • 10. 1. Introduction Semi-supervised Learning • The aim is that the unsupervised method will improve accuracy on the task at hand. • However, the unsupervised methods so far proposed for deep architectures are in our opinion somewhat complicated and restricted. • They include: • generative model (a restricted Boltzmann machine) [Hinton 2006] • autoencoders [Bengio 2007] • sparse encoding [Ranzato 2007] • Moreover, in all cases these methods are not compared with, and appear on the surface to be completely different to, algorithms developed by researchers in the field of semi- supervised learning. 9
  • 11. 1. Introduction Objective • In this presentation, we advocate simpler ways of performing deep learning by leveraging existing ideas from semi-supervised algorithms so far developed in shallow architectures. • In particular, we focus on the idea of combining an embedding-based regularizer with a supervised learner to perform semi-supervised learning • Laplacian SVMs [Belkin et al.,2006] • We show that this method can be: • (i) generalized to multi-layer networks and trained by stochastic gradient descent • (ii) is valid in the deep learning framework given above. 10
  • 12. ADGENDA 12 1. Introduction 2. Semi-supervised Embedding 3. Semi-supervised Embedding for Deep Learning 4. Existing Approaches to Deep Learning 5. Experimental Result 6. Conclusion
  • 13. 2. Semi-supervised Embedding Structure Assumption • Structure Assumption: points within the same structure (such as a cluster or a manifold) are likely to have the same label. • Algorithms • cluster kernels [Chapelle et al., 2003] • LDS [Chapelle & Zien,2005] • label propagation [Zhu & Ghahramani, 2002] • LapSVM [Belkin et al., 2006] • To understand these methods we will first review some relevant approaches to linear and nonlinear embedding. 13 Labelled data The labels can be estimated as soon as the points on a same manifold
  • 14. 2. Semi-supervised Embedding Embedding Algorithms 14 minimize s.t. Balancing Constant
  • 15. 2. Semi-supervised Embedding Embedding Algorithms: Multidimensional Scaling (MDS) • A classical algorithm that attempts to preserve the distance between points, whilst embedding them in a lower dimensional space • MDS is equivalent to PCA if the metric is Euclidean [Williams, 2001] 15 Manifold
  • 16. 2. Semi-supervised Embedding Embedding Algorithms: ISOMAP [Tenenbaum 2000] 16 W=1/8
  • 17. 2. Semi-supervised Embedding Embedding Algorithms: Laplacian Eigenmaps [Belkin & Niyogi 2003] 17 Laplacian Kernel
  • 18. 2. Semi-supervised Embedding Semi-superbised Algorithms 19 • L: Labelled Data • U: Unlabelled Data
  • 19. 2. Semi-supervised Embedding Semi-superbised Algorithms: Label Propagation [Zhu & Ghahramani, 2002] 20 RegularizeLoss
  • 20. 2. Semi-supervised Embedding Semi-superbised Algorithms: LapSVM [Belkin et al., 2006] 21 1 10
  • 21. ADGENDA 22 1. Introduction 2. Semi-supervised Embedding 3. Semi-supervised Embedding for Deep Learning 4. Existing Approaches to Deep Learning 5. Experimental Result 6. Conclusion
  • 22. 3. Semi-supervised Embedding for Deep Learning Overview • We would like to use the ideas developed in semi-supervised learning for deep learning. Deep learning consists of learning a model with several layers of non-linear mapping. • We will consider multi-layer networks with M layers of hidden units that give a C- dimensional output vector: • wO: the weights for the output layer • typically the kth layer is defined as • S: a non-linear squashing function such as tanh. 23
  • 23. 3. Semi-supervised Embedding for Deep Learning Overview • Here, we describe a standard fully connected multi-layer network but prior knowledge about a particular problem could lead one to other network designs. • For example in sequence and image recognition time delay and convolutional networks (TDNNs and CNNs) [Le-Cun et al., 1998] have been very successful. • In those approaches one introduces layers that apply convolutions on their input which take into account locality information in the data, i.e. they learn features from image patches or windows within a sequence. 24
  • 24. 3. Semi-supervised Embedding for Deep Learning Three Modes of Embedding in Deep Architectures • The general method we propose for semi-supervised deep learning is to add a semi- supervised regularizer in deep architectures in one of three different modes 25 (a) Output (b) Internal (c) Auxiliary
  • 25. 3. Semi-supervised Embedding for Deep Learning Three Modes of Embedding in Deep Architectures: (a) Output • Add a semi-supervised loss (regularizer) to the supervised loss on the entire network’s output • This is most similar to the shallow techniques described before 26
  • 26. 3. Semi-supervised Embedding for Deep Learning Three Modes of Embedding in Deep Architectures: (b) Internal • Regularize the k th hidden layer (7) directly: 27 • is the output of the network up to the hidden layer.
  • 27. 3. Semi-supervised Embedding for Deep Learning Three Modes of Embedding in Deep Architectures: (c) Auxiliary • Create an auxiliary network which shares the first k layers of the original network but has a new final set of weights: • We train this network to embed unlabeled data simultaneously as we train the original network on labeled data. 28
  • 28. 3. Semi-supervised Embedding for Deep Learning Algorithm 30
  • 29. 3. Semi-supervised Embedding for Deep Learning Labeling unlabeled data as Neighbors 31
  • 30. ADGENDA 36 1. Introduction 2. Semi-supervised Embedding 3. Semi-supervised Embedding for Deep Learning 4. Existing Approaches to Deep Learning 5. Experimental Result 6. Conclusion
  • 31. Deep Boltzmann Machine • ボルツマンマシンの一種 • RBM を多段に重ねたような形 37 中間層II 中間層I 入力層 ノード値 バイアス エネルギー関数
  • 32. Auto-encoder • Auto-encoder framework [Lecaum 1987][Bourland 1988][Hinton 1994] : unsupervised feature construction method の一つ。 • auto-: 「自己の」 auto-encoder を直訳すると自己符号器 • encoder, decoder, reconstruction error の 3 つの要素から構成。 • encoder と decoder の合成写像が入力値を再現するような学習を行う。 • 学習は入力値と出力値の誤差(reconstruction error)を最小化することで行われる。 • この操作によって、入力値をより適切な表現に写像する auto-encoder が得られる。 38 (Auto-)encoder Decoder Reconstruction Representation Vector t-th Input Vector Output Vector Reconstruction Error
  • 33. ADGENDA 39 1. Introduction 2. Semi-supervised Embedding 3. Semi-supervised Embedding for Deep Learning 4. Existing Approaches to Deep Learning 5. Experimental Result 6. Conclusion
  • 35. ADGENDA 47 1. Introduction 2. Semi-supervised Embedding 3. Semi-supervised Embedding for Deep Learning 4. Existing Approaches to Deep Learning 5. Experimental Result 6. Conclusion
  • 36. 6. Conclusion • In this work, we showed how one can improve supervised learning for deep architectures if one jointly learns an embedding task using unlabeled data. • Our results both confirm previous findings and generalize them. • Researchers using shallow architectures already showed two ways of using embedding to improve generalization • (i) embedding unlabeled data as a separate pre-processing step (i.e., first layer training) • (ii) using embedding as a regularizer (i.e., at the output layer). • More importantly, we generalized these approaches to the case where we train a semi- supervised embedding jointly with a supervised deep multi-layer architecture on any (or all) layers of the network 48