SlideShare a Scribd company logo
2007 NIPS Tutorial on:   Deep Belief Nets Geoffrey Hinton Canadian Institute for Advanced Research & Department of Computer Science University of Toronto
Some things you will learn in this tutorial ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
A spectrum of machine learning tasks ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Typical Statistics-- -- -- -- -- -- Artificial Intelligence
Historical background: First generation neural networks ,[object Object],[object Object],[object Object],non-adaptive hand-coded features output units  e.g. class labels input units e.g. pixels Sketch of a typical perceptron from the 1960’s Bomb Toy
Second generation neural networks (~1985) input vector hidden layers outputs Back-propagate  error signal to get derivatives for learning Compare outputs with  correct answer  to get error signal
A temporary digression ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
What is wrong with back-propagation? ,[object Object],[object Object],[object Object],[object Object],[object Object]
Overcoming the limitations of  back-propagation ,[object Object],[object Object],[object Object],[object Object],[object Object]
Belief Nets ,[object Object],[object Object],[object Object],[object Object],stochastic hidden  cause visible  effect We will use nets composed of layers of stochastic binary variables with weighted connections.  Later, we will generalize to other types of variable.
Stochastic binary units (Bernoulli variables) ,[object Object],[object Object],0 0 1
Learning Deep Belief Nets ,[object Object],[object Object],[object Object],[object Object],stochastic hidden  cause visible  effect
The learning rule for sigmoid belief nets ,[object Object],[object Object],j i learning rate
Explaining away  (Judea Pearl) ,[object Object],[object Object],truck hits house earthquake house jumps 20 20 -20 -10 -10 p(1,1)=.0001 p(1,0)=.4999 p(0,1)=.4999 p(0,0)=.0001 posterior
Why it is usually very hard to learn  sigmoid belief nets one layer at a time ,[object Object],[object Object],[object Object],[object Object],[object Object],data hidden variables hidden variables hidden variables likelihood W prior
Two types of generative neural network ,[object Object],[object Object],[object Object]
Restricted Boltzmann Machines (Smolensky ,1986, called them “harmoniums”) ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],hidden i j visible
The Energy of a joint configuration (ignoring terms to do with biases) weight between units i and j Energy with configuration  v  on the visible units and  h  on the hidden units binary state of visible unit i binary state of hidden unit j
Weights    Energies    Probabilities ,[object Object],[object Object],[object Object],[object Object]
Using energies to define probabilities ,[object Object],[object Object],partition function
A picture of the maximum likelihood learning algorithm for an RBM i j i j i j i j t = 0  t = 1  t = 2  t = infinity Start with a training vector on the visible units. Then alternate between updating all the hidden units in parallel and updating all the visible units in parallel. a fantasy
A quick way to learn an RBM i j i j t = 0  t = 1  Start with a training vector on the visible units. Update all the hidden units in parallel Update the all the visible units in parallel to get a “reconstruction”. Update the hidden units again .  This is not following the gradient of the log likelihood. But it works well. It is approximately following the gradient of another objective function (Carreira-Perpinan & Hinton, 2005). reconstruction data
How to learn a set of features that are good for reconstructing images of the digit 2   50 binary feature neurons   16 x 16 pixel  image   50 binary feature neurons   16 x 16 pixel  image   Increment  weights between an active pixel and an active feature Decrement  weights between an active pixel and an active feature data  (reality) reconstruction  (better than reality)
The final 50  x  256 weights Each neuron grabs a different feature.
How well can we reconstruct the digit images from the binary feature activations? Reconstruction from activated binary features Data Reconstruction from activated binary features Data New test images from the digit class that the model was trained on Images from an unfamiliar digit class  (the network tries to see every image as a 2)
Three ways to combine probability density models (an underlying theme of the tutorial) ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Training a deep network (the main reason RBM’s are interesting) ,[object Object],[object Object],[object Object],[object Object],[object Object]
The generative model after learning 3 layers ,[object Object],[object Object],[object Object],[object Object],h2 data h1 h3
Why does greedy learning work?  An aside: Averaging factorial distributions  ,[object Object],[object Object],[object Object]
Why does greedy learning work? ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],data distribution on visible units aggregated  posterior distribution  on hidden units   Task 2 Task 1
Why does greedy learning work? The weights, W,  in the bottom level RBM define p(v|h) and they also, indirectly, define p(h). So we can express the RBM model as If we leave p(v|h) alone and improve p(h), we will improve p(v).  To improve p(h), we need it to be a better model of the  aggregated posterior  distribution over hidden vectors produced by applying W to the data.
Which distributions are factorial in a directed belief net? ,[object Object],[object Object],[object Object],[object Object]
Why does greedy learning fail in a directed module? ,[object Object],[object Object],[object Object],[object Object],[object Object],data distribution on visible units Task 2 Task 1 aggregated  posterior distribution  on hidden units
A model of digit recognition 2000 top-level neurons 500 neurons 500 neurons   28 x 28 pixel  image   10 label neurons   The model learns to generate combinations of labels and images.  To perform recognition we start with a neutral state of the label units and do an up-pass from the image followed by a few iterations of the top-level associative memory. The top two layers form an associative memory  whose  energy landscape models the low dimensional manifolds of the digits. The energy valleys have names
Fine-tuning with a contrastive version of the “wake-sleep” algorithm ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Show the movie of the network generating digits  (available at www.cs.toronto/~hinton)
Samples generated by letting the associative memory run with one label clamped. There are 1000 iterations of alternating Gibbs sampling between samples.
Examples of correctly recognized handwritten digits that the neural network had never seen before  Its very good
How well does it discriminate on MNIST test set with no extra information about geometric distortions? ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Unsupervised “pre-training” also helps for models that have more data and better priors ,[object Object],[object Object],Back-propagation alone:  0.49%  Unsupervised layer-by-layer pre-training followed by backprop:  0.39%  (record)
Another view of why layer-by-layer  learning works ,[object Object],[object Object]
An infinite sigmoid belief net that is equivalent to an RBM ,[object Object],[object Object],[object Object],v 1 h 1 v 0 h 0 v 2 h 2 etc.
[object Object],[object Object],[object Object],[object Object],[object Object],Inference in a directed net with replicated weights v 1 h 1 v 0 h 0 v 2 h 2 etc. + + + +
[object Object],[object Object],v 1 h 1 v 0 h 0 v 2 h 2 etc.
[object Object],[object Object],[object Object],Learning a deep directed network v 1 h 1 v 0 h 0 v 2 h 2 etc. v 0 h 0
[object Object],[object Object],v 1 h 1 v 0 h 0 v 2 h 2 etc. v 1 h 0
How many layers should we use and how wide should they be?  (I am indebted to Karl Rove for this slide) ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
What happens when the weights in higher layers become different from the weights in the first layer? ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
A stack of RBM’s (Yee-Whye Teh’s idea) ,[object Object],[object Object],[object Object],v h 1 h 2 h L
The variational bound Now we cancel out all of the partition functions except the top one and replace log probabilities by goodnesses using the fact that: This has simple derivatives that give a more justifiable fine-tuning algorithm than contrastive wake-sleep . Each time we replace the prior over the hidden units by a better prior, we win by the difference in the probability assigned
Summary so far ,[object Object],[object Object],[object Object],[object Object],[object Object]
Overview of the rest of the tutorial ,[object Object],[object Object],[object Object],[object Object],[object Object]
BREAK
Fine-tuning for discrimination ,[object Object],[object Object],[object Object],[object Object],[object Object]
Why backpropagation works better after greedy pre-training ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
First, model the distribution of digit images 2000 units 500 units  500 units  28 x 28 pixel  image   The network learns a density model for unlabeled digit images. When we generate from the model we get things that look like real digits of all classes.  But do the hidden features really help with digit discrimination?  Add 10 softmaxed units to the top and do backpropagation. The top two layers form a restricted Boltzmann machine whose free energy landscape should model the low dimensional manifolds of the digits.
Results on permutation-invariant MNIST task ,[object Object],[object Object],[object Object],[object Object]
Combining deep belief nets with Gaussian processes ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Learning to extract the orientation of a face patch  (Salakhutdinov & Hinton, NIPS 2007)
The training and test sets 11,000 unlabeled cases 100, 500, or 1000 labeled cases face patches from new people
The root mean squared error in the orientation when combining GP’s with deep belief nets 22.2  17.9  15.2 17.2  12.7  7.2 16.3  11.2  6.4 GP on the pixels GP on top-level features GP on top-level features with fine-tuning 100 labels 500 labels 1000 labels Conclusion: The deep features are much better than the pixels. Fine-tuning helps a lot.
Modeling real-valued data ,[object Object],[object Object],[object Object],[object Object],[object Object]
The free-energy of a mean-field logistic unit ,[object Object],[object Object],0  output->  1 F  energy -  entropy
An RBM with real-valued visible units ,[object Object],E   energy-gradient produced by the total input to a visible unit   parabolic containment function Welling et. al. (2005) show how to extend RBM’s to the exponential family. See also Bengio et. al. 2007)
Deep Autoencoders (Hinton & Salakhutdinov, 2006) ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],1000  neurons 500 neurons 500 neurons   250 neurons   250 neurons   30   1000  neurons 28x28 28x28 linear units
A comparison of methods for compressing digit images to 30 real numbers. real  data 30-D  deep auto 30-D logistic PCA 30-D  PCA
Do the 30-D codes found by the deep autoencoder preserve the class structure of the data? ,[object Object],[object Object],[object Object],[object Object]
entirely unsupervised except for the colors
Retrieving documents that are similar to a query document ,[object Object],[object Object]
How to compress the count vector  ,[object Object],[object Object],[object Object],2000  reconstructed counts 500 neurons 2000  word counts 500 neurons   250 neurons   250 neurons   10   input vector output vector
Performance of the autoencoder at document retrieval ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Proportion of retrieved documents in same class as query Number of documents retrieved
First compress all documents to 2 numbers using a type of PCA  Then use different colors for different document categories
First compress all documents to 2 numbers.  Then use different colors for different document categories
Finding binary codes for documents ,[object Object],[object Object],[object Object],[object Object],[object Object],2000  reconstructed counts 500 neurons 2000  word counts 500 neurons   250 neurons   250 neurons   30   noise
Semantic hashing: Using a deep autoencoder as a hash-function for finding  approximate  matches (Salakhutdinov & Hinton, 2007) hash function “ supermarket search”
How good is a shortlist found this way?  ,[object Object],[object Object],[object Object],[object Object]
Time series models ,[object Object],[object Object],[object Object]
Time series models ,[object Object],[object Object],[object Object],[object Object],[object Object]
The conditional RBM model  (Sutskever & Hinton 2007) ,[object Object],[object Object],[object Object],[object Object],[object Object],t- 2   t- 1   t t
Why the autoregressive connections do not cause problems ,[object Object],[object Object],[object Object],[object Object]
Generating from a learned model ,[object Object],[object Object],[object Object],t- 2   t- 1   t t
Stacking temporal RBM’s ,[object Object],[object Object],[object Object],[object Object],[object Object]
An application to modeling  motion capture data  (Taylor, Roweis & Hinton, 2007) ,[object Object],[object Object],[object Object]
Modeling multiple types of motion ,[object Object],[object Object],[object Object],[object Object],[object Object]
Show Graham Taylor’s movies available at www.cs.toronto/~hinton
Generating the parts of an object  ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],sloppy  top-down activation of parts clean-up using known interactions pose parameters  features with top-down support “ square” + Its like soldiers on a parade ground
Semi-restricted Boltzmann Machines ,[object Object],[object Object],[object Object],[object Object],[object Object],hidden i j visible
Learning a semi-restricted Boltzmann Machine i j i j t = 0  t = 1  1. Start with a training vector on the visible units. 2. Update all of the hidden units in parallel 3. Repeatedly update all of the visible units in parallel using mean-field updates (with the hiddens fixed) to get a “reconstruction”. 4. Update all of the hidden units again .  reconstruction data k i i k k k update for a lateral weight
Learning in Semi-restricted Boltzmann Machines  ,[object Object],[object Object],[object Object],total input to i damping
Results on modeling natural image patches using a stack of RBM’s  (Osindero and Hinton)   ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Directed Connections Directed Connections Undirected Connections 400 Gaussian units  Hidden MRF with 2000 units Hidden MRF with 500 units 1000 top-level units. No MRF.
Without lateral connections real data samples from model
With lateral connections real data samples from model
A funny way to use an MRF ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Why do we whiten data? ,[object Object],[object Object],[object Object],[object Object]
Whitening the learning signal instead of the data ,[object Object],[object Object],[object Object],[object Object],[object Object]
Towards a more powerful, multi-linear stackable learning module ,[object Object],[object Object],[object Object],[object Object]
Higher order Boltzmann machines  (Sejnowski, ~1986) ,[object Object],[object Object],[object Object],[object Object]
A picture of a conditional,  higher-order Boltzmann machine  (Hinton & Lang,1985) retina-based features object-based features viewing transform ,[object Object],[object Object],[object Object]
Using conditional higher-order Boltzmann machines to model image transformations  (Memisevic and Hinton, 2007) ,[object Object],[object Object],image(t) image(t+1) image transformation
Readings on deep belief nets ,[object Object],[object Object]

More Related Content

What's hot

Multilayer & Back propagation algorithm
Multilayer & Back propagation algorithmMultilayer & Back propagation algorithm
Multilayer & Back propagation algorithm
swapnac12
 
Deep Belief Networks (D2L1 Deep Learning for Speech and Language UPC 2017)
Deep Belief Networks (D2L1 Deep Learning for Speech and Language UPC 2017)Deep Belief Networks (D2L1 Deep Learning for Speech and Language UPC 2017)
Deep Belief Networks (D2L1 Deep Learning for Speech and Language UPC 2017)
Universitat Politècnica de Catalunya
 
Classification by back propagation, multi layered feed forward neural network...
Classification by back propagation, multi layered feed forward neural network...Classification by back propagation, multi layered feed forward neural network...
Classification by back propagation, multi layered feed forward neural network...
bihira aggrey
 
Classification By Back Propagation
Classification By Back PropagationClassification By Back Propagation
Classification By Back Propagation
BineeshJose99
 
Deep Learning
Deep Learning Deep Learning
Deep Learning
Roshan Chettri
 
Introduction to Applied Machine Learning
Introduction to Applied Machine LearningIntroduction to Applied Machine Learning
Introduction to Applied Machine Learning
SheilaJimenezMorejon
 
Multi Layer Network
Multi Layer NetworkMulti Layer Network
DNN and RBM
DNN and RBMDNN and RBM
DNN and RBM
Masayuki Tanaka
 
Deep learning lecture - part 1 (basics, CNN)
Deep learning lecture - part 1 (basics, CNN)Deep learning lecture - part 1 (basics, CNN)
Deep learning lecture - part 1 (basics, CNN)
SungminYou
 
Auto encoders in Deep Learning
Auto encoders in Deep LearningAuto encoders in Deep Learning
Auto encoders in Deep Learning
Shajun Nisha
 
2.5 backpropagation
2.5 backpropagation2.5 backpropagation
2.5 backpropagation
Krish_ver2
 
Classification using back propagation algorithm
Classification using back propagation algorithmClassification using back propagation algorithm
Classification using back propagation algorithm
KIRAN R
 
Review-image-segmentation-by-deep-learning
Review-image-segmentation-by-deep-learningReview-image-segmentation-by-deep-learning
Review-image-segmentation-by-deep-learning
Trong-An Bui
 
nural network ER. Abhishek k. upadhyay
nural network ER. Abhishek  k. upadhyaynural network ER. Abhishek  k. upadhyay
nural network ER. Abhishek k. upadhyay
abhishek upadhyay
 
Lecture 7: Recurrent Neural Networks
Lecture 7: Recurrent Neural NetworksLecture 7: Recurrent Neural Networks
Lecture 7: Recurrent Neural Networks
Sang Jun Lee
 
Backpropagation
BackpropagationBackpropagation
Backpropagationariffast
 
Back propagation method
Back propagation methodBack propagation method
Back propagation method
Prof. Neeta Awasthy
 
lecture07.ppt
lecture07.pptlecture07.ppt
lecture07.pptbutest
 
Convolutional neural network from VGG to DenseNet
Convolutional neural network from VGG to DenseNetConvolutional neural network from VGG to DenseNet
Convolutional neural network from VGG to DenseNet
SungminYou
 

What's hot (20)

Multilayer & Back propagation algorithm
Multilayer & Back propagation algorithmMultilayer & Back propagation algorithm
Multilayer & Back propagation algorithm
 
Deep Belief Networks (D2L1 Deep Learning for Speech and Language UPC 2017)
Deep Belief Networks (D2L1 Deep Learning for Speech and Language UPC 2017)Deep Belief Networks (D2L1 Deep Learning for Speech and Language UPC 2017)
Deep Belief Networks (D2L1 Deep Learning for Speech and Language UPC 2017)
 
Classification by back propagation, multi layered feed forward neural network...
Classification by back propagation, multi layered feed forward neural network...Classification by back propagation, multi layered feed forward neural network...
Classification by back propagation, multi layered feed forward neural network...
 
Classification By Back Propagation
Classification By Back PropagationClassification By Back Propagation
Classification By Back Propagation
 
Deep Learning
Deep Learning Deep Learning
Deep Learning
 
Introduction to Applied Machine Learning
Introduction to Applied Machine LearningIntroduction to Applied Machine Learning
Introduction to Applied Machine Learning
 
Multi Layer Network
Multi Layer NetworkMulti Layer Network
Multi Layer Network
 
DNN and RBM
DNN and RBMDNN and RBM
DNN and RBM
 
Deep learning lecture - part 1 (basics, CNN)
Deep learning lecture - part 1 (basics, CNN)Deep learning lecture - part 1 (basics, CNN)
Deep learning lecture - part 1 (basics, CNN)
 
Auto encoders in Deep Learning
Auto encoders in Deep LearningAuto encoders in Deep Learning
Auto encoders in Deep Learning
 
2.5 backpropagation
2.5 backpropagation2.5 backpropagation
2.5 backpropagation
 
Classification using back propagation algorithm
Classification using back propagation algorithmClassification using back propagation algorithm
Classification using back propagation algorithm
 
Review-image-segmentation-by-deep-learning
Review-image-segmentation-by-deep-learningReview-image-segmentation-by-deep-learning
Review-image-segmentation-by-deep-learning
 
nural network ER. Abhishek k. upadhyay
nural network ER. Abhishek  k. upadhyaynural network ER. Abhishek  k. upadhyay
nural network ER. Abhishek k. upadhyay
 
Lecture 7: Recurrent Neural Networks
Lecture 7: Recurrent Neural NetworksLecture 7: Recurrent Neural Networks
Lecture 7: Recurrent Neural Networks
 
Backpropagation
BackpropagationBackpropagation
Backpropagation
 
Back propagation method
Back propagation methodBack propagation method
Back propagation method
 
lecture07.ppt
lecture07.pptlecture07.ppt
lecture07.ppt
 
Convolutional neural network from VGG to DenseNet
Convolutional neural network from VGG to DenseNetConvolutional neural network from VGG to DenseNet
Convolutional neural network from VGG to DenseNet
 
Ann
Ann Ann
Ann
 

Similar to NIPS2007: deep belief nets

Modeling of neural image compression using gradient decent technology
Modeling of neural image compression using gradient decent technologyModeling of neural image compression using gradient decent technology
Modeling of neural image compression using gradient decent technology
theijes
 
Artificial Neural Networks ppt.pptx for final sem cse
Artificial Neural Networks  ppt.pptx for final sem cseArtificial Neural Networks  ppt.pptx for final sem cse
Artificial Neural Networks ppt.pptx for final sem cse
NaveenBhajantri1
 
Artificial neural networks
Artificial neural networks Artificial neural networks
Artificial neural networks
ShwethaShreeS
 
Understanding Deep Learning & Parameter Tuning with MXnet, H2o Package in R
Understanding Deep Learning & Parameter Tuning with MXnet, H2o Package in RUnderstanding Deep Learning & Parameter Tuning with MXnet, H2o Package in R
Understanding Deep Learning & Parameter Tuning with MXnet, H2o Package in R
Manish Saraswat
 
ai7.ppt
ai7.pptai7.ppt
ai7.ppt
qwerty432737
 
Chapter 09 classification advanced
Chapter 09 classification advancedChapter 09 classification advanced
Chapter 09 classification advanced
Houw Liong The
 
Chapter 09 class advanced
Chapter 09 class advancedChapter 09 class advanced
Chapter 09 class advanced
Houw Liong The
 
ai7.ppt
ai7.pptai7.ppt
ai7.ppt
MrHacker61
 
Introduction Of Artificial neural network
Introduction Of Artificial neural networkIntroduction Of Artificial neural network
Introduction Of Artificial neural network
Nagarajan
 
ANNs have been widely used in various domains for: Pattern recognition Funct...
ANNs have been widely used in various domains for: Pattern recognition  Funct...ANNs have been widely used in various domains for: Pattern recognition  Funct...
ANNs have been widely used in various domains for: Pattern recognition Funct...
vijaym148
 
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdfCCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
Asst.prof M.Gokilavani
 
machinelearningengineeringslideshare-160909192132 (1).pdf
machinelearningengineeringslideshare-160909192132 (1).pdfmachinelearningengineeringslideshare-160909192132 (1).pdf
machinelearningengineeringslideshare-160909192132 (1).pdf
ShivareddyGangam
 
Artificial neural network for machine learning
Artificial neural network for machine learningArtificial neural network for machine learning
Artificial neural network for machine learning
grinu
 
Face recognition using artificial neural network
Face recognition using artificial neural networkFace recognition using artificial neural network
Face recognition using artificial neural networkSumeet Kakani
 
Applications in Machine Learning
Applications in Machine LearningApplications in Machine Learning
Applications in Machine Learning
Joel Graff
 
Fundamental of deep learning
Fundamental of deep learningFundamental of deep learning
Fundamental of deep learning
Stanley Wang
 
deep learning
deep learningdeep learning
deep learning
Hassanein Alwan
 
introduction to deeplearning
introduction to deeplearningintroduction to deeplearning
introduction to deeplearning
Eyad Alshami
 
Deep Neural Network DNN.docx
Deep Neural Network DNN.docxDeep Neural Network DNN.docx
Deep Neural Network DNN.docx
jaffarbikat
 

Similar to NIPS2007: deep belief nets (20)

Modeling of neural image compression using gradient decent technology
Modeling of neural image compression using gradient decent technologyModeling of neural image compression using gradient decent technology
Modeling of neural image compression using gradient decent technology
 
Artificial Neural Networks ppt.pptx for final sem cse
Artificial Neural Networks  ppt.pptx for final sem cseArtificial Neural Networks  ppt.pptx for final sem cse
Artificial Neural Networks ppt.pptx for final sem cse
 
Artificial neural networks
Artificial neural networks Artificial neural networks
Artificial neural networks
 
Understanding Deep Learning & Parameter Tuning with MXnet, H2o Package in R
Understanding Deep Learning & Parameter Tuning with MXnet, H2o Package in RUnderstanding Deep Learning & Parameter Tuning with MXnet, H2o Package in R
Understanding Deep Learning & Parameter Tuning with MXnet, H2o Package in R
 
ai7.ppt
ai7.pptai7.ppt
ai7.ppt
 
Chapter 09 classification advanced
Chapter 09 classification advancedChapter 09 classification advanced
Chapter 09 classification advanced
 
Chapter 09 class advanced
Chapter 09 class advancedChapter 09 class advanced
Chapter 09 class advanced
 
ai7.ppt
ai7.pptai7.ppt
ai7.ppt
 
Introduction Of Artificial neural network
Introduction Of Artificial neural networkIntroduction Of Artificial neural network
Introduction Of Artificial neural network
 
ANNs have been widely used in various domains for: Pattern recognition Funct...
ANNs have been widely used in various domains for: Pattern recognition  Funct...ANNs have been widely used in various domains for: Pattern recognition  Funct...
ANNs have been widely used in various domains for: Pattern recognition Funct...
 
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdfCCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
 
N ns 1
N ns 1N ns 1
N ns 1
 
machinelearningengineeringslideshare-160909192132 (1).pdf
machinelearningengineeringslideshare-160909192132 (1).pdfmachinelearningengineeringslideshare-160909192132 (1).pdf
machinelearningengineeringslideshare-160909192132 (1).pdf
 
Artificial neural network for machine learning
Artificial neural network for machine learningArtificial neural network for machine learning
Artificial neural network for machine learning
 
Face recognition using artificial neural network
Face recognition using artificial neural networkFace recognition using artificial neural network
Face recognition using artificial neural network
 
Applications in Machine Learning
Applications in Machine LearningApplications in Machine Learning
Applications in Machine Learning
 
Fundamental of deep learning
Fundamental of deep learningFundamental of deep learning
Fundamental of deep learning
 
deep learning
deep learningdeep learning
deep learning
 
introduction to deeplearning
introduction to deeplearningintroduction to deeplearning
introduction to deeplearning
 
Deep Neural Network DNN.docx
Deep Neural Network DNN.docxDeep Neural Network DNN.docx
Deep Neural Network DNN.docx
 

More from zukun

My lyn tutorial 2009
My lyn tutorial 2009My lyn tutorial 2009
My lyn tutorial 2009zukun
 
ETHZ CV2012: Tutorial openCV
ETHZ CV2012: Tutorial openCVETHZ CV2012: Tutorial openCV
ETHZ CV2012: Tutorial openCVzukun
 
ETHZ CV2012: Information
ETHZ CV2012: InformationETHZ CV2012: Information
ETHZ CV2012: Informationzukun
 
Siwei lyu: natural image statistics
Siwei lyu: natural image statisticsSiwei lyu: natural image statistics
Siwei lyu: natural image statisticszukun
 
Lecture9 camera calibration
Lecture9 camera calibrationLecture9 camera calibration
Lecture9 camera calibrationzukun
 
Brunelli 2008: template matching techniques in computer vision
Brunelli 2008: template matching techniques in computer visionBrunelli 2008: template matching techniques in computer vision
Brunelli 2008: template matching techniques in computer visionzukun
 
Modern features-part-4-evaluation
Modern features-part-4-evaluationModern features-part-4-evaluation
Modern features-part-4-evaluationzukun
 
Modern features-part-3-software
Modern features-part-3-softwareModern features-part-3-software
Modern features-part-3-softwarezukun
 
Modern features-part-2-descriptors
Modern features-part-2-descriptorsModern features-part-2-descriptors
Modern features-part-2-descriptorszukun
 
Modern features-part-1-detectors
Modern features-part-1-detectorsModern features-part-1-detectors
Modern features-part-1-detectorszukun
 
Modern features-part-0-intro
Modern features-part-0-introModern features-part-0-intro
Modern features-part-0-introzukun
 
Lecture 02 internet video search
Lecture 02 internet video searchLecture 02 internet video search
Lecture 02 internet video searchzukun
 
Lecture 01 internet video search
Lecture 01 internet video searchLecture 01 internet video search
Lecture 01 internet video searchzukun
 
Lecture 03 internet video search
Lecture 03 internet video searchLecture 03 internet video search
Lecture 03 internet video searchzukun
 
Icml2012 tutorial representation_learning
Icml2012 tutorial representation_learningIcml2012 tutorial representation_learning
Icml2012 tutorial representation_learningzukun
 
Advances in discrete energy minimisation for computer vision
Advances in discrete energy minimisation for computer visionAdvances in discrete energy minimisation for computer vision
Advances in discrete energy minimisation for computer visionzukun
 
Gephi tutorial: quick start
Gephi tutorial: quick startGephi tutorial: quick start
Gephi tutorial: quick startzukun
 
EM algorithm and its application in probabilistic latent semantic analysis
EM algorithm and its application in probabilistic latent semantic analysisEM algorithm and its application in probabilistic latent semantic analysis
EM algorithm and its application in probabilistic latent semantic analysiszukun
 
Object recognition with pictorial structures
Object recognition with pictorial structuresObject recognition with pictorial structures
Object recognition with pictorial structureszukun
 
Iccv2011 learning spatiotemporal graphs of human activities
Iccv2011 learning spatiotemporal graphs of human activities Iccv2011 learning spatiotemporal graphs of human activities
Iccv2011 learning spatiotemporal graphs of human activities zukun
 

More from zukun (20)

My lyn tutorial 2009
My lyn tutorial 2009My lyn tutorial 2009
My lyn tutorial 2009
 
ETHZ CV2012: Tutorial openCV
ETHZ CV2012: Tutorial openCVETHZ CV2012: Tutorial openCV
ETHZ CV2012: Tutorial openCV
 
ETHZ CV2012: Information
ETHZ CV2012: InformationETHZ CV2012: Information
ETHZ CV2012: Information
 
Siwei lyu: natural image statistics
Siwei lyu: natural image statisticsSiwei lyu: natural image statistics
Siwei lyu: natural image statistics
 
Lecture9 camera calibration
Lecture9 camera calibrationLecture9 camera calibration
Lecture9 camera calibration
 
Brunelli 2008: template matching techniques in computer vision
Brunelli 2008: template matching techniques in computer visionBrunelli 2008: template matching techniques in computer vision
Brunelli 2008: template matching techniques in computer vision
 
Modern features-part-4-evaluation
Modern features-part-4-evaluationModern features-part-4-evaluation
Modern features-part-4-evaluation
 
Modern features-part-3-software
Modern features-part-3-softwareModern features-part-3-software
Modern features-part-3-software
 
Modern features-part-2-descriptors
Modern features-part-2-descriptorsModern features-part-2-descriptors
Modern features-part-2-descriptors
 
Modern features-part-1-detectors
Modern features-part-1-detectorsModern features-part-1-detectors
Modern features-part-1-detectors
 
Modern features-part-0-intro
Modern features-part-0-introModern features-part-0-intro
Modern features-part-0-intro
 
Lecture 02 internet video search
Lecture 02 internet video searchLecture 02 internet video search
Lecture 02 internet video search
 
Lecture 01 internet video search
Lecture 01 internet video searchLecture 01 internet video search
Lecture 01 internet video search
 
Lecture 03 internet video search
Lecture 03 internet video searchLecture 03 internet video search
Lecture 03 internet video search
 
Icml2012 tutorial representation_learning
Icml2012 tutorial representation_learningIcml2012 tutorial representation_learning
Icml2012 tutorial representation_learning
 
Advances in discrete energy minimisation for computer vision
Advances in discrete energy minimisation for computer visionAdvances in discrete energy minimisation for computer vision
Advances in discrete energy minimisation for computer vision
 
Gephi tutorial: quick start
Gephi tutorial: quick startGephi tutorial: quick start
Gephi tutorial: quick start
 
EM algorithm and its application in probabilistic latent semantic analysis
EM algorithm and its application in probabilistic latent semantic analysisEM algorithm and its application in probabilistic latent semantic analysis
EM algorithm and its application in probabilistic latent semantic analysis
 
Object recognition with pictorial structures
Object recognition with pictorial structuresObject recognition with pictorial structures
Object recognition with pictorial structures
 
Iccv2011 learning spatiotemporal graphs of human activities
Iccv2011 learning spatiotemporal graphs of human activities Iccv2011 learning spatiotemporal graphs of human activities
Iccv2011 learning spatiotemporal graphs of human activities
 

Recently uploaded

2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...
Sandy Millin
 
"Protectable subject matters, Protection in biotechnology, Protection of othe...
"Protectable subject matters, Protection in biotechnology, Protection of othe..."Protectable subject matters, Protection in biotechnology, Protection of othe...
"Protectable subject matters, Protection in biotechnology, Protection of othe...
SACHIN R KONDAGURI
 
The Diamonds of 2023-2024 in the IGRA collection
The Diamonds of 2023-2024 in the IGRA collectionThe Diamonds of 2023-2024 in the IGRA collection
The Diamonds of 2023-2024 in the IGRA collection
Israel Genealogy Research Association
 
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
Nguyen Thanh Tu Collection
 
Natural birth techniques - Mrs.Akanksha Trivedi Rama University
Natural birth techniques - Mrs.Akanksha Trivedi Rama UniversityNatural birth techniques - Mrs.Akanksha Trivedi Rama University
Natural birth techniques - Mrs.Akanksha Trivedi Rama University
Akanksha trivedi rama nursing college kanpur.
 
South African Journal of Science: Writing with integrity workshop (2024)
South African Journal of Science: Writing with integrity workshop (2024)South African Journal of Science: Writing with integrity workshop (2024)
South African Journal of Science: Writing with integrity workshop (2024)
Academy of Science of South Africa
 
How to Add Chatter in the odoo 17 ERP Module
How to Add Chatter in the odoo 17 ERP ModuleHow to Add Chatter in the odoo 17 ERP Module
How to Add Chatter in the odoo 17 ERP Module
Celine George
 
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
Levi Shapiro
 
DRUGS AND ITS classification slide share
DRUGS AND ITS classification slide shareDRUGS AND ITS classification slide share
DRUGS AND ITS classification slide share
taiba qazi
 
Introduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp NetworkIntroduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp Network
TechSoup
 
Pride Month Slides 2024 David Douglas School District
Pride Month Slides 2024 David Douglas School DistrictPride Month Slides 2024 David Douglas School District
Pride Month Slides 2024 David Douglas School District
David Douglas School District
 
Top five deadliest dog breeds in America
Top five deadliest dog breeds in AmericaTop five deadliest dog breeds in America
Top five deadliest dog breeds in America
Bisnar Chase Personal Injury Attorneys
 
Delivering Micro-Credentials in Technical and Vocational Education and Training
Delivering Micro-Credentials in Technical and Vocational Education and TrainingDelivering Micro-Credentials in Technical and Vocational Education and Training
Delivering Micro-Credentials in Technical and Vocational Education and Training
AG2 Design
 
The simplified electron and muon model, Oscillating Spacetime: The Foundation...
The simplified electron and muon model, Oscillating Spacetime: The Foundation...The simplified electron and muon model, Oscillating Spacetime: The Foundation...
The simplified electron and muon model, Oscillating Spacetime: The Foundation...
RitikBhardwaj56
 
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
MysoreMuleSoftMeetup
 
Acetabularia Information For Class 9 .docx
Acetabularia Information For Class 9  .docxAcetabularia Information For Class 9  .docx
Acetabularia Information For Class 9 .docx
vaibhavrinwa19
 
Executive Directors Chat Leveraging AI for Diversity, Equity, and Inclusion
Executive Directors Chat  Leveraging AI for Diversity, Equity, and InclusionExecutive Directors Chat  Leveraging AI for Diversity, Equity, and Inclusion
Executive Directors Chat Leveraging AI for Diversity, Equity, and Inclusion
TechSoup
 
Thesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.pptThesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.ppt
EverAndrsGuerraGuerr
 
S1-Introduction-Biopesticides in ICM.pptx
S1-Introduction-Biopesticides in ICM.pptxS1-Introduction-Biopesticides in ICM.pptx
S1-Introduction-Biopesticides in ICM.pptx
tarandeep35
 
MATATAG CURRICULUM: ASSESSING THE READINESS OF ELEM. PUBLIC SCHOOL TEACHERS I...
MATATAG CURRICULUM: ASSESSING THE READINESS OF ELEM. PUBLIC SCHOOL TEACHERS I...MATATAG CURRICULUM: ASSESSING THE READINESS OF ELEM. PUBLIC SCHOOL TEACHERS I...
MATATAG CURRICULUM: ASSESSING THE READINESS OF ELEM. PUBLIC SCHOOL TEACHERS I...
NelTorrente
 

Recently uploaded (20)

2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...
 
"Protectable subject matters, Protection in biotechnology, Protection of othe...
"Protectable subject matters, Protection in biotechnology, Protection of othe..."Protectable subject matters, Protection in biotechnology, Protection of othe...
"Protectable subject matters, Protection in biotechnology, Protection of othe...
 
The Diamonds of 2023-2024 in the IGRA collection
The Diamonds of 2023-2024 in the IGRA collectionThe Diamonds of 2023-2024 in the IGRA collection
The Diamonds of 2023-2024 in the IGRA collection
 
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
 
Natural birth techniques - Mrs.Akanksha Trivedi Rama University
Natural birth techniques - Mrs.Akanksha Trivedi Rama UniversityNatural birth techniques - Mrs.Akanksha Trivedi Rama University
Natural birth techniques - Mrs.Akanksha Trivedi Rama University
 
South African Journal of Science: Writing with integrity workshop (2024)
South African Journal of Science: Writing with integrity workshop (2024)South African Journal of Science: Writing with integrity workshop (2024)
South African Journal of Science: Writing with integrity workshop (2024)
 
How to Add Chatter in the odoo 17 ERP Module
How to Add Chatter in the odoo 17 ERP ModuleHow to Add Chatter in the odoo 17 ERP Module
How to Add Chatter in the odoo 17 ERP Module
 
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
 
DRUGS AND ITS classification slide share
DRUGS AND ITS classification slide shareDRUGS AND ITS classification slide share
DRUGS AND ITS classification slide share
 
Introduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp NetworkIntroduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp Network
 
Pride Month Slides 2024 David Douglas School District
Pride Month Slides 2024 David Douglas School DistrictPride Month Slides 2024 David Douglas School District
Pride Month Slides 2024 David Douglas School District
 
Top five deadliest dog breeds in America
Top five deadliest dog breeds in AmericaTop five deadliest dog breeds in America
Top five deadliest dog breeds in America
 
Delivering Micro-Credentials in Technical and Vocational Education and Training
Delivering Micro-Credentials in Technical and Vocational Education and TrainingDelivering Micro-Credentials in Technical and Vocational Education and Training
Delivering Micro-Credentials in Technical and Vocational Education and Training
 
The simplified electron and muon model, Oscillating Spacetime: The Foundation...
The simplified electron and muon model, Oscillating Spacetime: The Foundation...The simplified electron and muon model, Oscillating Spacetime: The Foundation...
The simplified electron and muon model, Oscillating Spacetime: The Foundation...
 
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
 
Acetabularia Information For Class 9 .docx
Acetabularia Information For Class 9  .docxAcetabularia Information For Class 9  .docx
Acetabularia Information For Class 9 .docx
 
Executive Directors Chat Leveraging AI for Diversity, Equity, and Inclusion
Executive Directors Chat  Leveraging AI for Diversity, Equity, and InclusionExecutive Directors Chat  Leveraging AI for Diversity, Equity, and Inclusion
Executive Directors Chat Leveraging AI for Diversity, Equity, and Inclusion
 
Thesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.pptThesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.ppt
 
S1-Introduction-Biopesticides in ICM.pptx
S1-Introduction-Biopesticides in ICM.pptxS1-Introduction-Biopesticides in ICM.pptx
S1-Introduction-Biopesticides in ICM.pptx
 
MATATAG CURRICULUM: ASSESSING THE READINESS OF ELEM. PUBLIC SCHOOL TEACHERS I...
MATATAG CURRICULUM: ASSESSING THE READINESS OF ELEM. PUBLIC SCHOOL TEACHERS I...MATATAG CURRICULUM: ASSESSING THE READINESS OF ELEM. PUBLIC SCHOOL TEACHERS I...
MATATAG CURRICULUM: ASSESSING THE READINESS OF ELEM. PUBLIC SCHOOL TEACHERS I...
 

NIPS2007: deep belief nets

  • 1. 2007 NIPS Tutorial on: Deep Belief Nets Geoffrey Hinton Canadian Institute for Advanced Research & Department of Computer Science University of Toronto
  • 2.
  • 3.
  • 4.
  • 5. Second generation neural networks (~1985) input vector hidden layers outputs Back-propagate error signal to get derivatives for learning Compare outputs with correct answer to get error signal
  • 6.
  • 7.
  • 8.
  • 9.
  • 10.
  • 11.
  • 12.
  • 13.
  • 14.
  • 15.
  • 16.
  • 17. The Energy of a joint configuration (ignoring terms to do with biases) weight between units i and j Energy with configuration v on the visible units and h on the hidden units binary state of visible unit i binary state of hidden unit j
  • 18.
  • 19.
  • 20. A picture of the maximum likelihood learning algorithm for an RBM i j i j i j i j t = 0 t = 1 t = 2 t = infinity Start with a training vector on the visible units. Then alternate between updating all the hidden units in parallel and updating all the visible units in parallel. a fantasy
  • 21. A quick way to learn an RBM i j i j t = 0 t = 1 Start with a training vector on the visible units. Update all the hidden units in parallel Update the all the visible units in parallel to get a “reconstruction”. Update the hidden units again . This is not following the gradient of the log likelihood. But it works well. It is approximately following the gradient of another objective function (Carreira-Perpinan & Hinton, 2005). reconstruction data
  • 22. How to learn a set of features that are good for reconstructing images of the digit 2 50 binary feature neurons 16 x 16 pixel image 50 binary feature neurons 16 x 16 pixel image Increment weights between an active pixel and an active feature Decrement weights between an active pixel and an active feature data (reality) reconstruction (better than reality)
  • 23. The final 50 x 256 weights Each neuron grabs a different feature.
  • 24. How well can we reconstruct the digit images from the binary feature activations? Reconstruction from activated binary features Data Reconstruction from activated binary features Data New test images from the digit class that the model was trained on Images from an unfamiliar digit class (the network tries to see every image as a 2)
  • 25.
  • 26.
  • 27.
  • 28.
  • 29.
  • 30. Why does greedy learning work? The weights, W, in the bottom level RBM define p(v|h) and they also, indirectly, define p(h). So we can express the RBM model as If we leave p(v|h) alone and improve p(h), we will improve p(v). To improve p(h), we need it to be a better model of the aggregated posterior distribution over hidden vectors produced by applying W to the data.
  • 31.
  • 32.
  • 33. A model of digit recognition 2000 top-level neurons 500 neurons 500 neurons 28 x 28 pixel image 10 label neurons The model learns to generate combinations of labels and images. To perform recognition we start with a neutral state of the label units and do an up-pass from the image followed by a few iterations of the top-level associative memory. The top two layers form an associative memory whose energy landscape models the low dimensional manifolds of the digits. The energy valleys have names
  • 34.
  • 35. Show the movie of the network generating digits (available at www.cs.toronto/~hinton)
  • 36. Samples generated by letting the associative memory run with one label clamped. There are 1000 iterations of alternating Gibbs sampling between samples.
  • 37. Examples of correctly recognized handwritten digits that the neural network had never seen before Its very good
  • 38.
  • 39.
  • 40.
  • 41.
  • 42.
  • 43.
  • 44.
  • 45.
  • 46.
  • 47.
  • 48.
  • 49. The variational bound Now we cancel out all of the partition functions except the top one and replace log probabilities by goodnesses using the fact that: This has simple derivatives that give a more justifiable fine-tuning algorithm than contrastive wake-sleep . Each time we replace the prior over the hidden units by a better prior, we win by the difference in the probability assigned
  • 50.
  • 51.
  • 52. BREAK
  • 53.
  • 54.
  • 55. First, model the distribution of digit images 2000 units 500 units 500 units 28 x 28 pixel image The network learns a density model for unlabeled digit images. When we generate from the model we get things that look like real digits of all classes. But do the hidden features really help with digit discrimination? Add 10 softmaxed units to the top and do backpropagation. The top two layers form a restricted Boltzmann machine whose free energy landscape should model the low dimensional manifolds of the digits.
  • 56.
  • 57.
  • 58. Learning to extract the orientation of a face patch (Salakhutdinov & Hinton, NIPS 2007)
  • 59. The training and test sets 11,000 unlabeled cases 100, 500, or 1000 labeled cases face patches from new people
  • 60. The root mean squared error in the orientation when combining GP’s with deep belief nets 22.2 17.9 15.2 17.2 12.7 7.2 16.3 11.2 6.4 GP on the pixels GP on top-level features GP on top-level features with fine-tuning 100 labels 500 labels 1000 labels Conclusion: The deep features are much better than the pixels. Fine-tuning helps a lot.
  • 61.
  • 62.
  • 63.
  • 64.
  • 65. A comparison of methods for compressing digit images to 30 real numbers. real data 30-D deep auto 30-D logistic PCA 30-D PCA
  • 66.
  • 68.
  • 69.
  • 70.
  • 71. Proportion of retrieved documents in same class as query Number of documents retrieved
  • 72. First compress all documents to 2 numbers using a type of PCA Then use different colors for different document categories
  • 73. First compress all documents to 2 numbers. Then use different colors for different document categories
  • 74.
  • 75. Semantic hashing: Using a deep autoencoder as a hash-function for finding approximate matches (Salakhutdinov & Hinton, 2007) hash function “ supermarket search”
  • 76.
  • 77.
  • 78.
  • 79.
  • 80.
  • 81.
  • 82.
  • 83.
  • 84.
  • 85. Show Graham Taylor’s movies available at www.cs.toronto/~hinton
  • 86.
  • 87.
  • 88. Learning a semi-restricted Boltzmann Machine i j i j t = 0 t = 1 1. Start with a training vector on the visible units. 2. Update all of the hidden units in parallel 3. Repeatedly update all of the visible units in parallel using mean-field updates (with the hiddens fixed) to get a “reconstruction”. 4. Update all of the hidden units again . reconstruction data k i i k k k update for a lateral weight
  • 89.
  • 90.
  • 91. Without lateral connections real data samples from model
  • 92. With lateral connections real data samples from model
  • 93.
  • 94.
  • 95.
  • 96.
  • 97.
  • 98.
  • 99.
  • 100.