The document discusses deep learning and unsupervised feature learning using restricted Boltzmann machines (RBMs). RBMs are stochastic neural networks that can learn representations of data through unsupervised learning. The document outlines how RBMs work, how their parameters are learned through approximate maximum likelihood methods, and how RBMs have been applied to learn features from images, text, and collaborative filtering data.
** AI & Deep Learning with Tensorflow Training: https://www.edureka.co/ai-deep-learning-with-tensorflow **
This Edureka PPT on "Restricted Boltzmann Machine" will provide you with detailed and comprehensive knowledge of Restricted Boltzmann Machines, also known as RBM. You will also get to know about the layers in RBM and their working.
This PPT covers the following topics:
1. History of RBM
2. Difference between RBM & Autoencoders
3. Introduction to RBMs
4. Energy-Based Model & Probabilistic Model
5. Training of RBMs
6. Example: Collaborative Filtering
Follow us to never miss an update in the future.
Instagram: https://www.instagram.com/edureka_learning/
Facebook: https://www.facebook.com/edurekaIN/
Twitter: https://twitter.com/edurekain
LinkedIn: https://www.linkedin.com/company/edureka
** AI & Deep Learning with Tensorflow Training: https://www.edureka.co/ai-deep-learning-with-tensorflow **
This Edureka PPT on "Restricted Boltzmann Machine" will provide you with detailed and comprehensive knowledge of Restricted Boltzmann Machines, also known as RBM. You will also get to know about the layers in RBM and their working.
This PPT covers the following topics:
1. History of RBM
2. Difference between RBM & Autoencoders
3. Introduction to RBMs
4. Energy-Based Model & Probabilistic Model
5. Training of RBMs
6. Example: Collaborative Filtering
Follow us to never miss an update in the future.
Instagram: https://www.instagram.com/edureka_learning/
Facebook: https://www.facebook.com/edurekaIN/
Twitter: https://twitter.com/edurekain
LinkedIn: https://www.linkedin.com/company/edureka
Presentation for the Berlin Computer Vision Group, December 2020 on deep learning methods for image segmentation: Instance segmentation, semantic segmentation, and panoptic segmentation.
Jiawei Han, Micheline Kamber and Jian Pei
Data Mining: Concepts and Techniques, 3rd ed.
The Morgan Kaufmann Series in Data Management Systems
Morgan Kaufmann Publishers, July 2011. ISBN 978-0123814791
This slides about brief Introduction to Image Restoration Techniques. How to estimate the degradation function, noise models and its probability density functions.
Image Segmentation
Types of Image Segmentation
Semantic Segmentation
Instance Segmentation
Types of Image Segmentation Techniques based on the image properties:
Threshold Method.
Edge Based Segmentation.
Region-Based Segmentation.
Clustering Based Segmentation.
Watershed Based Method.
Artificial Neural Network Based Segmentation.
Organizations are collecting massive amounts of data from disparate sources. However, they continuously face the challenge of identifying patterns, detecting anomalies, and projecting future trends based on large data sets. Machine learning for anomaly detection provides a promising alternative for the detection and classification of anomalies.
Find out how you can implement machine learning to increase speed and effectiveness in identifying and reporting anomalies.
In this webinar, we will discuss :
How machine learning can help in identifying anomalies
Steps to approach an anomaly detection problem
Various techniques available for anomaly detection
Best algorithms that fit in different situations
Implementing an anomaly detection use case on the StreamAnalytix platform
To view the webinar - https://bit.ly/2IV2ahC
Super resolution in deep learning era - Jaejun YooJaeJun Yoo
Abstract (Eng/Kor):
Image restoration (IR) is one of the fundamental problems, which includes denoising, deblurring, super-resolution, etc. Among those, in today's talk, I will more focus on the super-resolution task. There are two main streams in the super-resolution studies; a traditional model-based optimization and a discriminative learning method. I will present the pros and cons of both methods and their recent developments in the research field. Finally, I will provide a mathematical view that explains both methods in a single holistic framework, while achieving the best of both worlds. The last slide summarizes the remaining problems that are yet to be solved in the field.
영상 복원(Image restoration, IR)은 low-level vision에서 매우 중요하게 다루는 근본적인 문제 중 하나로서 denoising, deblurring, super-resolution 등의 다양한 영상 처리 문제를 포괄합니다. 오늘 발표에서는 영상 복원 분야 중에서도 super-resolution 문제에 대해 집중적으로 다루겠습니다. 전통적인 model-based optimization 방식과 deep learning을 적용하여 문제를 푸는 방식에 대해, 각각의 장단점과 최신 연구 발전 흐름을 소개하겠습니다. 마지막으로는 이 둘을 하나로 잇는 통일된 관점을 제시하고 관련 연구들 살펴본 후, super-resolution 분야에서 아직 남아있는 문제점들을 정리하겠습니다.
Part 2 of the Deep Learning Fundamentals Series, this session discusses Tuning Training (including hyperparameters, overfitting/underfitting), Training Algorithms (including different learning rates, backpropagation), Optimization (including stochastic gradient descent, momentum, Nesterov Accelerated Gradient, RMSprop, Adaptive algorithms - Adam, Adadelta, etc.), and a primer on Convolutional Neural Networks. The demos included in these slides are running on Keras with TensorFlow backend on Databricks.
Convolutional Deep Belief Networks for Scalable Unsupervised Learning of Hier...Mad Scientists
Convolutional Deep Belief Networks for Scalable Unsupervised Learning of Hierarchical Representations, Honglak Lee(ICML 2009)
석사과정 세미나 발표를 위해 논문을 읽고 분석한 내용입니다. CDBN은 CNN와 DBN의 장점을 결합하여 translation invariance와 computational competence를 확보하였고, probabilistic max-pooling을 통해 image restoration을 할 수 있는 undirected DBM을 구성할 수 있게 합니다.
Presentation for the Berlin Computer Vision Group, December 2020 on deep learning methods for image segmentation: Instance segmentation, semantic segmentation, and panoptic segmentation.
Jiawei Han, Micheline Kamber and Jian Pei
Data Mining: Concepts and Techniques, 3rd ed.
The Morgan Kaufmann Series in Data Management Systems
Morgan Kaufmann Publishers, July 2011. ISBN 978-0123814791
This slides about brief Introduction to Image Restoration Techniques. How to estimate the degradation function, noise models and its probability density functions.
Image Segmentation
Types of Image Segmentation
Semantic Segmentation
Instance Segmentation
Types of Image Segmentation Techniques based on the image properties:
Threshold Method.
Edge Based Segmentation.
Region-Based Segmentation.
Clustering Based Segmentation.
Watershed Based Method.
Artificial Neural Network Based Segmentation.
Organizations are collecting massive amounts of data from disparate sources. However, they continuously face the challenge of identifying patterns, detecting anomalies, and projecting future trends based on large data sets. Machine learning for anomaly detection provides a promising alternative for the detection and classification of anomalies.
Find out how you can implement machine learning to increase speed and effectiveness in identifying and reporting anomalies.
In this webinar, we will discuss :
How machine learning can help in identifying anomalies
Steps to approach an anomaly detection problem
Various techniques available for anomaly detection
Best algorithms that fit in different situations
Implementing an anomaly detection use case on the StreamAnalytix platform
To view the webinar - https://bit.ly/2IV2ahC
Super resolution in deep learning era - Jaejun YooJaeJun Yoo
Abstract (Eng/Kor):
Image restoration (IR) is one of the fundamental problems, which includes denoising, deblurring, super-resolution, etc. Among those, in today's talk, I will more focus on the super-resolution task. There are two main streams in the super-resolution studies; a traditional model-based optimization and a discriminative learning method. I will present the pros and cons of both methods and their recent developments in the research field. Finally, I will provide a mathematical view that explains both methods in a single holistic framework, while achieving the best of both worlds. The last slide summarizes the remaining problems that are yet to be solved in the field.
영상 복원(Image restoration, IR)은 low-level vision에서 매우 중요하게 다루는 근본적인 문제 중 하나로서 denoising, deblurring, super-resolution 등의 다양한 영상 처리 문제를 포괄합니다. 오늘 발표에서는 영상 복원 분야 중에서도 super-resolution 문제에 대해 집중적으로 다루겠습니다. 전통적인 model-based optimization 방식과 deep learning을 적용하여 문제를 푸는 방식에 대해, 각각의 장단점과 최신 연구 발전 흐름을 소개하겠습니다. 마지막으로는 이 둘을 하나로 잇는 통일된 관점을 제시하고 관련 연구들 살펴본 후, super-resolution 분야에서 아직 남아있는 문제점들을 정리하겠습니다.
Part 2 of the Deep Learning Fundamentals Series, this session discusses Tuning Training (including hyperparameters, overfitting/underfitting), Training Algorithms (including different learning rates, backpropagation), Optimization (including stochastic gradient descent, momentum, Nesterov Accelerated Gradient, RMSprop, Adaptive algorithms - Adam, Adadelta, etc.), and a primer on Convolutional Neural Networks. The demos included in these slides are running on Keras with TensorFlow backend on Databricks.
Convolutional Deep Belief Networks for Scalable Unsupervised Learning of Hier...Mad Scientists
Convolutional Deep Belief Networks for Scalable Unsupervised Learning of Hierarchical Representations, Honglak Lee(ICML 2009)
석사과정 세미나 발표를 위해 논문을 읽고 분석한 내용입니다. CDBN은 CNN와 DBN의 장점을 결합하여 translation invariance와 computational competence를 확보하였고, probabilistic max-pooling을 통해 image restoration을 할 수 있는 undirected DBM을 구성할 수 있게 합니다.
Tutorial on Deep learning and ApplicationsNhatHai Phan
In this presentation, I would like to review basis techniques, models, and applications in deep learning. Hope you find the slides are interesting. Further information about my research can be found at "https://sites.google.com/site/ihaiphan/."
NhatHai Phan
CIS Department,
University of Oregon, Eugene, OR
Georgia Tech cse6242 - Intro to Deep Learning and DL4JJosh Patterson
Introduction to deep learning and DL4J - http://deeplearning4j.org/ - a guest lecture by Josh Patterson at Georgia Tech for the cse6242 graduate class.
These slides accompanied a demo of Deeplearning4j at the SF Data Mining Meetup hosted by Trulia.
http://www.meetup.com/Data-Mining/events/212445872/
Deep-learning is useful in detecting identifying similarities to augment search and text analytics; predicting customer lifetime value and churn; and recognizing faces and voices.
Deeplearning4j is an infinitely scalable deep-learning architecture suitable for Hadoop and other big-data structures. It includes a distributed deep-learning framework and a normal deep-learning framework; i.e. it runs on a single thread as well. Training takes place in the cluster, which means it can process massive amounts of data. Nets are trained in parallel via iterative reduce, and they are equally compatible with Java, Scala and Clojure. The distributed deep-learning framework is made for data input and neural net training at scale, and its output should be highly accurate predictive models.
The framework's neural nets include restricted Boltzmann machines, deep-belief networks, deep autoencoders, convolutional nets and recursive neural tensor networks.
Finally, Deeplearning4j integrates with GPUs. A stable version was released in October.
Searching for magic formula by deep learningJames Ahn
지난 삼성오픈소스 컨퍼런스에서 발표한 것으로
핀테크에 Deep Learning을 적용해 과연 기계에 의한 투자가 가능한지를 시험해 본 발표자료입니다.
첫시도에는 데이터를 무시하고 알고리즘에 전적으로 의존해 완벽한 실패로 끝났지만,
두번째 시도에서는 데이터를 이해하고, 이에 맞는 알고리즘의 적용으로 처참했던 첫번째 시도보다 훨씬 더 좋은 결과를 보여주었습니다.
역시 데이터를 가지고 무엇을 한다는 것은 데이터에 대한 이해가 처음이자 마지막이라는 것을 확실히 느끼게 되었습니다.
Deep Learning for NLP (without Magic) - Richard Socher and Christopher ManningBigDataCloud
A tutorial given at NAACL HLT 2013.
Richard Socher and Christopher Manning
http://nlp.stanford.edu/courses/NAACL2013/
Machine learning is everywhere in today's NLP, but by and large machine learning amounts to numerical optimization of weights for human designed representations and features. The goal of deep learning is to explore how computers can take advantage of data to develop features and representations appropriate for complex interpretation tasks. This tutorial aims to cover the basic motivation, ideas, models and learning algorithms in deep learning for natural language processing. Recently, these methods have been shown to perform very well on various NLP tasks such as language modeling, POS tagging, named entity recognition, sentiment analysis and paraphrase detection, among others. The most attractive quality of these techniques is that they can perform well without any external hand-designed resources or time-intensive feature engineering. Despite these advantages, many researchers in NLP are not familiar with these methods. Our focus is on insight and understanding, using graphical illustrations and simple, intuitive derivations. The goal of the tutorial is to make the inner workings of these techniques transparent, intuitive and their results interpretable, rather than black boxes labeled "magic here". The first part of the tutorial presents the basics of neural networks, neural word vectors, several simple models based on local windows and the math and algorithms of training via backpropagation. In this section applications include language modeling and POS tagging. In the second section we present recursive neural networks which can learn structured tree outputs as well as vector representations for phrases and sentences. We cover both equations as well as applications. We show how training can be achieved by a modified version of the backpropagation algorithm introduced before. These modifications allow the algorithm to work on tree structures. Applications include sentiment analysis and paraphrase detection. We also draw connections to recent work in semantic compositionality in vector spaces. The principle goal, again, is to make these methods appear intuitive and interpretable rather than mathematically confusing. By this point in the tutorial, the audience members should have a clear understanding of how to build a deep learning system for word-, sentence- and document-level tasks. The last part of the tutorial gives a general overview of the different applications of deep learning in NLP, including bag of words models. We will provide a discussion of NLP-oriented issues in modeling, interpretation, representational power, and optimization.
Learning RBM(Restricted Boltzmann Machine in Practice)Mad Scientists
In Deep Learning, learning RBM is basic hierarchical components of the layer. In this slide, we can learn basic components of RBM (bipartite graph, Gibbs Sampling, Contrastive Divergence (1-CD), Energy function of entropy).
Face Feature Recognition System with Deep Belief Networks, for Korean/KIISE T...Mad Scientists
I submitted KIISE Thesis that <face>, 2014.
In this presentation, I present why I use deep learning to find facial features and what is limitation of before method.
Interest in Deep Learning has been growing in the past few years. With advances in software and hardware technologies, Neural Networks are making a resurgence. With interest in AI based applications growing, and companies like IBM, Google, Microsoft, NVidia investing heavily in computing and software applications, it is time to understand Deep Learning better!
In this lecture, we will get an introduction to Autoencoders and Recurrent Neural Networks and understand the state-of-the-art in hardware and software architectures. Functional Demos will be presented in Keras, a popular Python package with a backend in Theano. This will be a preview of the QuantUniversity Deep Learning Workshop that will be offered in 2017.
Meta-GMVAE: Mixture of Gaussian VAE for Unsupervised Meta-LearningMLAI2
Unsupervised learning aims to learn meaningful representations from unlabeled data which can captures its intrinsic structure, that can be transferred to downstream tasks. Meta-learning, whose objective is to learn to generalize across tasks such that the learned model can rapidly adapt to a novel task, shares the spirit of unsupervised learning in that the both seek to learn more effective and efficient learning procedure than learning from scratch. The fundamental difference of the two is that the most meta-learning approaches are supervised, assuming full access to the labels. However, acquiring labeled dataset for meta-training not only is costly as it requires human efforts in labeling but also limits its applications to pre-defined task distributions. In this paper, we propose a principled unsupervised meta-learning model, namely Meta-GMVAE, based on Variational Autoencoder (VAE) and set-level variational inference. Moreover, we introduce a mixture of Gaussian (GMM) prior, assuming that each modality represents each class-concept in a randomly sampled episode, which we optimize with Expectation-Maximization (EM). Then, the learned model can be used for downstream few-shot classification tasks, where we obtain task-specific parameters by performing semi-supervised EM on the latent representations of the support and query set, and predict labels of the query set by computing aggregated posteriors. We validate our model on Omniglot and Mini-ImageNet datasets by evaluating its performance on downstream few-shot classification tasks. The results show that our model obtain impressive performance gains over existing unsupervised meta-learning baselines, even outperforming supervised MAML on a certain setting.
multi modal transformers representation generation .pptxsiddharth1729
This presentation covers multi modal representation learning by using multi modal transformer techniques like VisualBert, VilBert and MBit transformers. It discusses pros and cons of various modeling approaches.
2024.06.01 Introducing a competency framework for languag learning materials ...Sandy Millin
http://sandymillin.wordpress.com/iateflwebinar2024
Published classroom materials form the basis of syllabuses, drive teacher professional development, and have a potentially huge influence on learners, teachers and education systems. All teachers also create their own materials, whether a few sentences on a blackboard, a highly-structured fully-realised online course, or anything in between. Despite this, the knowledge and skills needed to create effective language learning materials are rarely part of teacher training, and are mostly learnt by trial and error.
Knowledge and skills frameworks, generally called competency frameworks, for ELT teachers, trainers and managers have existed for a few years now. However, until I created one for my MA dissertation, there wasn’t one drawing together what we need to know and do to be able to effectively produce language learning materials.
This webinar will introduce you to my framework, highlighting the key competencies I identified from my research. It will also show how anybody involved in language teaching (any language, not just English!), teacher training, managing schools or developing language learning materials can benefit from using the framework.
Acetabularia Information For Class 9 .docxvaibhavrinwa19
Acetabularia acetabulum is a single-celled green alga that in its vegetative state is morphologically differentiated into a basal rhizoid and an axially elongated stalk, which bears whorls of branching hairs. The single diploid nucleus resides in the rhizoid.
Francesca Gottschalk - How can education support child empowerment.pptxEduSkills OECD
Francesca Gottschalk from the OECD’s Centre for Educational Research and Innovation presents at the Ask an Expert Webinar: How can education support child empowerment?
Introduction to AI for Nonprofits with Tapp NetworkTechSoup
Dive into the world of AI! Experts Jon Hill and Tareq Monaur will guide you through AI's role in enhancing nonprofit websites and basic marketing strategies, making it easy to understand and apply.
Biological screening of herbal drugs: Introduction and Need for
Phyto-Pharmacological Screening, New Strategies for evaluating
Natural Products, In vitro evaluation techniques for Antioxidants, Antimicrobial and Anticancer drugs. In vivo evaluation techniques
for Anti-inflammatory, Antiulcer, Anticancer, Wound healing, Antidiabetic, Hepatoprotective, Cardio protective, Diuretics and
Antifertility, Toxicity studies as per OECD guidelines
Read| The latest issue of The Challenger is here! We are thrilled to announce that our school paper has qualified for the NATIONAL SCHOOLS PRESS CONFERENCE (NSPC) 2024. Thank you for your unwavering support and trust. Dive into the stories that made us stand out!
Instructions for Submissions thorugh G- Classroom.pptxJheel Barad
This presentation provides a briefing on how to upload submissions and documents in Google Classroom. It was prepared as part of an orientation for new Sainik School in-service teacher trainees. As a training officer, my goal is to ensure that you are comfortable and proficient with this essential tool for managing assignments and fostering student engagement.
Model Attribute Check Company Auto PropertyCeline George
In Odoo, the multi-company feature allows you to manage multiple companies within a single Odoo database instance. Each company can have its own configurations while still sharing common resources such as products, customers, and suppliers.
P05 deep boltzmann machines cvpr2012 deep learning methods for vision
1. Deep
Learning
Russ
Salakhutdinov
Department of Statistics and Computer Science!
University of Toronto
2. Mining
for
Structure
Massive
increase
in
both
computa:onal
power
and
the
amount
of
data
available
from
web,
video
cameras,
laboratory
measurements.
Images
&
Video
Text
&
Language
Speech
&
Audio
Gene
Expression
Rela:onal
Data/
Climate
Change
Product
Social
Network
Geological
Data
Recommenda:on
Mostly
Unlabeled
•
Develop
sta:s:cal
models
that
can
discover
underlying
structure,
cause,
or
sta:s:cal
correla:on
from
data
in
unsupervised
or
semi-‐supervised
way.
•
Mul:ple
applica:on
domains.
3. Talk
Roadmap
• Unsupervised
Feature
Learning
– Restricted
Boltzmann
Machines
DBM
– Deep
Belief
Networks
– Deep
Boltzmann
Machines
• Transfer
Learning
with
Deep
Models
RBM
• Mul:modal
Learning
4. Restricted
Boltzmann
Machines
hidden
variables
Bipar:te
Stochas:c
binary
visible
variables
Structure
are
connected
to
stochas:c
binary
hidden
variables
.
The
energy
of
the
joint
configura:on:
Image
visible
variables
model
parameters.
Probability
of
the
joint
configura:on
is
given
by
the
Boltzmann
distribu:on:
par::on
func:on
poten:al
func:ons
Markov
random
fields,
Boltzmann
machines,
log-‐linear
models.
5. Restricted
Boltzmann
Machines
hidden
variables
Bipar:te
Structure
Restricted:
No
interac:on
between
hidden
variables
Inferring
the
distribu:on
over
the
Image
visible
variables
hidden
variables
is
easy:
Factorizes:
Easy
to
compute
Similarly:
Markov
random
fields,
Boltzmann
machines,
log-‐linear
models.
6. Model
Learning
hidden
variables
Given
a
set
of
i.i.d.
training
examples
,
we
want
to
learn
model
parameters
.
Maximize
(penalized)
log-‐likelihood
objec:ve:
Image
visible
variables
Deriva:ve
of
the
log-‐likelihood:
Regulariza:on
Difficult
to
compute:
exponen:ally
many
configura:ons
7. Model
Learning
hidden
variables
Given
a
set
of
i.i.d.
training
examples
,
we
want
to
learn
model
parameters
.
Maximize
(penalized)
log-‐likelihood
objec:ve:
Image
visible
variables
Deriva:ve
of
the
log-‐likelihood:
Approximate
maximum
likelihood
learning:
Contras:ve
Divergence
(Hinton
2000)
Pseudo
Likelihood
(Besag
1977)
MCMC-‐MLE
es:mator
(Geyer
1991)
Composite
Likelihoods
(Lindsay,
1988;
Varin
2008)
Tempered
MCMC
Adap:ve
MCMC
(Salakhutdinov,
NIPS
2009)
(Salakhutdinov,
ICML
2010)
8. RBMs
for
Images
Gaussian-‐Bernoulli
RBM:
Define
energy
func:ons
for
various
data
modali:es:
Image
visible
variables
Gaussian
Bernoulli
9. RBMs
for
Images
Gaussian-‐Bernoulli
RBM:
Interpreta:on:
Mixture
of
exponen:al
number
of
Gaussians
Image
visible
variables
where
is
an
implicit
prior,
and
Gaussian
10. RBMs
for
Images
and
Text
Images:
Gaussian-‐Bernoulli
RBM
Learned
features
(out
of
10,000)
4
million
unlabelled
images
Text:
Mul:nomial-‐Bernoulli
RBM
Learned
features:
``topics’’
russian
clinton
computer
trade
stock
Reuters
dataset:
russia
house
system
country
wall
804,414
unlabeled
moscow
president
product
import
street
newswire
stories
yeltsin
bill
sobware
world
point
soviet
congress
develop
economy
dow
Bag-‐of-‐Words
11. Collabora:ve
Filtering
Bernoulli
hidden:
user
preferences
h
Learned
features:
``genre’’
W1
Fahrenheit
9/11
Independence
Day
v Bowling
for
Columbine
The
Day
Aber
Tomorrow
The
People
vs.
Larry
Flynt
Con
Air
Mul:nomial
visible:
user
ra:ngs
Canadian
Bacon
Men
in
Black
II
La
Dolce
Vita
Men
in
Black
Neglix
dataset:
480,189
users
Friday
the
13th
Scary
Movie
The
Texas
Chainsaw
Massacre
Naked
Gun
17,770
movies
Children
of
the
Corn
Hot
Shots!
Over
100
million
ra:ngs
Child's
Play
American
Pie
The
Return
of
Michael
Myers
Police
Academy
State-‐of-‐the-‐art
performance
on
the
Neglix
dataset.
Relates
to
Probabilis;c
Matrix
Factoriza;on
(Salakhutdinov & Mnih ICML 2007)!
12. Mul:ple
Applica:on
Domains
• Natural
Images
• Text/Documents
• Collabora:ve
Filtering
/
Matrix
Factoriza:on
• Video
(Langford
et
al.
ICML
2009
,
Lee
et
al.)
• Mo:on
Capture
(Taylor
et.al.
NIPS
2007)
• Speech
Percep:on
(Dahl
et.
al.
NIPS
2010,
Lee
et.al.
NIPS
2010)
Same
learning
algorithm
-‐-‐
mul:ple
input
domains.
Limita:ons
on
the
types
of
structure
that
can
be
represented
by
a
single
layer
of
low-‐level
features!
13. Talk
Roadmap
• Unsupervised
Feature
Learning
– Restricted
Boltzmann
Machines
DBM
– Deep
Belief
Networks
– Deep
Boltzmann
Machines
• Transfer
Learning
with
Deep
Models
RBM
• Mul:modal
Learning
14. Deep
Belief
Network
Low-‐level
features:
Edges
Built
from
unlabeled
inputs.
Input:
Pixels
Image
15. Deep
Belief
Network
Unsupervised
feature
learning.
Internal
representa:ons
capture
higher-‐order
sta:s:cal
structure
Higher-‐level
features:
Combina:on
of
edges
Low-‐level
features:
Edges
Built
from
unlabeled
inputs.
Input:
Pixels
Image
(Hinton et.al. Neural Computation 2006)!
16. Deep
Belief
Network
The
joint
probability
Deep
Belief
Network
distribu:on
factorizes:
h3
W3 RBM
h2
W2 Sigmoid
Belief
RBM
Sigmoid
h1 Belief
Network
Network
W1
v
17. DBNs
for
Classifica:on
2000
W3
500 Softmax Output
RBM
10 10
500 WT
4 WT+
4 4
W2
2000 2000
500 WT
3 WT+
3 3
RBM
500 500
WT
2 WT+
2 2
500 500 500
W1 WT
1 WT+
1 1
RBM
Pretraining Unrolling Fine tuning
•
Aber
layer-‐by-‐layer
unsupervised
pretraining,
discrimina:ve
fine-‐tuning
by
backpropaga:on
achieves
an
error
rate
of
1.2%
on
MNIST.
SVM’s
get
1.4%
and
randomly
ini:alized
backprop
gets
1.6%.
•
Clearly
unsupervised
learning
helps
generaliza:on.
It
ensures
that
most
of
the
informa:on
in
the
weights
comes
from
modeling
the
input
data.
19. Deep
Genera:ve
Model
Model
P(document)
Reuters
dataset:
804,414
newswire
stories:
unsupervised
European Community
Interbank Markets Monetary/Economic
Energy Markets
Disasters and
Accidents
Leading Legal/Judicial
Economic
Indicators
Bag
of
words
Government
Accounts/
Borrowings
Earnings
(Hinton & Salakhutdinov, Science 2006)!
20. Informa:on
Retrieval
Interbank Markets
European Community
Monetary/Economic
2-‐D
LSA
space
Energy Markets
Disasters and
Accidents
Leading Legal/Judicial
Economic
Indicators
Government
Accounts/
Borrowings
Earnings
•
The
Reuters
Corpus
Volume
II
contains
804,414
newswire
stories
(randomly
split
into
402,207
training
and
402,207
test).
•
“Bag-‐of-‐words”:
each
ar:cle
is
represented
as
a
vector
containing
the
counts
of
the
most
frequently
used
2000
words
in
the
training
set.
21. Seman:c
Hashing
European Community
Monetary/Economic
Address Space Disasters and
Accidents
Semantically
Similar
Documents
Semantic
Hashing Government
Function Energy Markets Borrowing
Document Accounts/Earnings
•
Learn
to
map
documents
into
seman;c
20-‐D
binary
codes.
•
Retrieve
similar
documents
stored
at
the
nearby
addresses
with
no
search
at
all.
22. Searching
Large
Image
Database
using
Binary
Codes
•
Map
images
into
binary
codes
for
fast
retrieval.
•
Small
Codes,
Torralba,
Fergus,
Weiss,
CVPR
2008
•
Spectral
Hashing,
Y.
Weiss,
A.
Torralba,
R.
Fergus,
NIPS
2008
•
Kulis
and
Darrell,
NIPS
2009,
Gong
and
Lazebnik,
CVPR
20111
•
Norouzi
and
Fleet,
ICML
2011,
23. Talk
Roadmap
• Unsupervised
Feature
Learning
– Restricted
Boltzmann
Machines
DBM
– Deep
Belief
Networks
– Deep
Boltzmann
Machines
• Transfer
Learning
with
Deep
Models
RBM
• Mul:modal
Learning
24. DBNs
vs.
DBMs
Deep Belief Network! Deep Boltzmann Machine!
h3 h3
W3 W3
h2 h2
W2 W2
h1 h1
W1 W1
v v
DBNs
are
hybrid
models:
•
Inference
in
DBNs
is
problema:c
due
to
explaining
away.
•
Only
greedy
pretrainig,
no
joint
op;miza;on
over
all
layers.
•
Approximate
inference
is
feed-‐forward:
no
boMom-‐up
and
top-‐down.
Introduce
a
new
class
of
models
called
Deep
Boltzmann
Machines.
25. Mathema:cal
Formula:on
Deep
Boltzmann
Machine
model
parameters
h3
• Dependencies
between
hidden
variables.
W3 • All
connec:ons
are
undirected.
h2 • Bopom-‐up
and
Top-‐down:
W2
h1
W1
v Top-‐down
Bopom-‐up
Input
Unlike
many
exis:ng
feed-‐forward
models:
ConvNet
(LeCun),
HMAX
(Poggio
et.al.),
Deep
Belief
Nets
(Hinton
et.al.)
26. Mathema:cal
Formula:on
Neural
Network
Deep
Boltzmann
Machine
Deep
Belief
Network
Output
h3 h3 h3
W3 W3 W3
h2 h2 h2
W2 W2 W2
h1 h1 h1
W1 W1 W1
v v v
Input
Unlike
many
exis:ng
feed-‐forward
models:
ConvNet
(LeCun),
HMAX
(Poggio),
Deep
Belief
Nets
(Hinton)
27. Mathema:cal
Formula:on
Neural
Network
Deep
Boltzmann
Machine
Deep
Belief
Network
Output
h3 h3 h3
W3 W3 W3
h2 h2 h2
inference
W2 W2 W2
h1 h1 h1
W1 W1 W1
v v v
Input
Unlike
many
exis:ng
feed-‐forward
models:
ConvNet
(LeCun),
HMAX
(Poggio),
Deep
Belief
Nets
(Hinton)
28. Mathema:cal
Formula:on
Deep
Boltzmann
Machine
model
parameters
h3
• Dependencies
between
hidden
variables.
W3
Maximum
likelihood
learning:
h2
W2
h1
Problem:
Both
expecta:ons
are
W1 intractable!
v
Learning
rule
for
undirected
graphical
models:
MRFs,
CRFs,
Factor
graphs.
29. Previous
Work
Many
approaches
for
learning
Boltzmann
machines
have
been
proposed
over
the
last
20
years:
•
Hinton
and
Sejnowski
(1983),
•
Peterson
and
Anderson
(1987)
•
Galland
(1991)
Real-‐world
applica:ons
–
thousands
•
Kappen
and
Rodriguez
(1998)
of
hidden
and
observed
variables
•
Lawrence,
Bishop,
and
Jordan
(1998)
•
Tanaka
(1998)
with
millions
of
parameters.
•
Welling
and
Hinton
(2002)
•
Zhu
and
Liu
(2002)
•
Welling
and
Teh
(2003)
•
Yasuda
and
Tanaka
(2009)
Many
of
the
previous
approaches
were
not
successful
for
learning
general
Boltzmann
machines
with
hidden
variables.
Algorithms
based
on
Contras:ve
Divergence,
Score
Matching,
Pseudo-‐
Likelihood,
Composite
Likelihood,
MCMC-‐MLE,
Piecewise
Learning,
cannot
handle
mul:ple
layers
of
hidden
variables.
30. New
Learning
Algorithm
Posterior
Inference
Simulate
from
the
Model
Condi:onal
Uncondi:onal
Approximate
Approximate
the
condi:onal
joint
distribu:on
(Salakhutdinov, 2008; NIPS 2009)!
31. New
Learning
Algorithm
Posterior
Inference
Simulate
from
the
Model
Condi:onal
Uncondi:onal
Approximate
Approximate
the
condi:onal
joint
distribu:on
Data-‐dependent
Data-‐independent
density
Match
32. New
Learning
Algorithm
Posterior
Inference
Simulate
from
the
Model
Condi:onal
Uncondi:onal
Approximate
Approximate
the
condi:onal
joint
distribu:on
Markov
Chain
Mean-‐Field
Monte
Carlo
Data-‐dependent
Data-‐independent
Match
Key
Idea
of
Our
Approach:
Data-‐dependent:
Varia;onal
Inference,
mean-‐field
theory
Data-‐independent:
Stochas;c
Approxima;on,
MCMC
based
33. Sampling
from
DBMs
Sampling
from
two-‐hidden
layer
DBM:
by
running
Markov
chain:
Randomly
ini:alize
…
Sample
34. Stochas:c
Approxima:on
Time
t=1
t=2
t=3
h2 h2 h2
Update
Update
h1 h1 h1
v v v
Update
and
sequen:ally,
where
• Generate
by
simula:ng
from
a
Markov
chain
that
leaves
invariant
(e.g.
Gibbs
or
M-‐H
sampler)
• Update
by
replacing
intractable
with
a
point
es:mate
In
prac:ce
we
simulate
several
Markov
chains
in
parallel.
Robbins
and
Monro,
Ann.
Math.
Stats,
1957
L.
Younes,
Probability
Theory
1989,
Tieleman,
ICML
2008.
35. Stochas:c
Approxima:on
Update
rule
decomposes:
True
gradient
Noise
term
Almost
sure
convergence
guarantees
as
learning
rate
Problem:
High-‐dimensional
data:
Markov
Chain
the
energy
landscape
is
highly
Monte
Carlo
mul:modal
Key
insight:
The
transi:on
operator
can
be
any
valid
transi:on
operator
–
Tempered
Transi:ons,
Parallel/Simulated
Tempering.
Connec:ons
to
the
theory
of
stochas:c
approxima:on
and
adap:ve
MCMC.
36. Varia:onal
Inference
Approximate
intractable
distribu:on
with
simpler,
tractable
distribu:on
:
Posterior
Inference
Mean-‐Field
Varia:onal
Lower
Bound
Minimize
KL
between
approxima:ng
and
true
distribu:ons
with
respect
to
varia:onal
parameters
.
(Salakhutdinov & Larochelle, AI & Statistics 2010)!
37. Varia:onal
Inference
Approximate
intractable
distribu:on
with
simpler,
tractable
distribu:on
:
Posterior
Inference
Varia:onal
Lower
Bound
Mean-‐Field:
Choose
a
fully
factorized
distribu:on:
Mean-‐Field
with
Varia;onal
Inference:
Maximize
the
lower
bound
w.r.t.
Varia:onal
parameters
.
Nonlinear
fixed-‐
point
equa:ons:
38. Varia:onal
Inference
Approximate
intractable
distribu:on
with
simpler,
tractable
distribu:on
:
Posterior
Inference
Varia:onal
Lower
Bound
Uncondi:onal
Simula:on
Markov
Chain
Mean-‐Field
Fast
Inference
1.
Varia;onal
Inference:
Maximize
the
lower
bound
w.r.t.
varia:onal
parameters
Monte
Carlo
2.
MCMC:
Apply
stochas:c
approxima:on
Learning
can
scale
to
to
update
model
parameters
millions
of
examples
Almost
sure
convergence
guarantees
to
an
asympto:cally
stable
point.
45. Deep
Boltzmann
Machine
Sanskrit
Model
P(image)
25,000
characters
from
50
alphabets
around
the
world.
•
3,000
hidden
variables
•
784
observed
variables
(28
by
28
images)
•
Over
2
million
parameters
Bernoulli
Markov
Random
Field
46. Deep
Boltzmann
Machine
Condi:onal
Simula:on
P(image|par:al
image)
Bernoulli
Markov
Random
Field
47. Handwri:ng
Recogni:on
MNIST
Dataset
Op:cal
Character
Recogni:on
60,000
examples
of
10
digits
42,152
examples
of
26
English
lepers
Learning
Algorithm
Error
Learning
Algorithm
Error
Logis:c
regression
12.0%
Logis:c
regression
22.14%
K-‐NN
3.09%
K-‐NN
18.92%
Neural
Net
(Plap
2005)
1.53%
Neural
Net
14.62%
SVM
(Decoste
et.al.
2002)
1.40%
SVM
(Larochelle
et.al.
2009)
9.70%
Deep
Autoencoder
1.40%
Deep
Autoencoder
10.05%
(Bengio
et.
al.
2007)
(Bengio
et.
al.
2007)
Deep
Belief
Net
1.20%
Deep
Belief
Net
9.68%
(Hinton
et.
al.
2006)
(Larochelle
et.
al.
2009)
DBM
0.95%
DBM
8.40%
Permuta:on-‐invariant
version.
48. Deep
Boltzmann
Machine
Gaussian-‐Bernoulli
Markov
Deep
Boltzmann
Machine
Random
Field
12,000
Latent
Variables
Model
P(image)
Planes
96
by
96
images
24,000
Training
Images
Stereo
pair
49. Genera:ve
Model
of
3-‐D
Objects
24,000
examples,
5
object
categories,
5
different
objects
within
each
category,
6
lightning
condi:ons,
9
eleva:ons,
18
azimuths.
51. Learning
Part-‐based
Hierarchy
Object
parts.
Combina:on
of
edges.
Trained
from
mul:ple
classes
(cars,
faces,
motorbikes,
airplanes).
Lee
et.al.,
ICML
2009
52. Robust
Boltzmann
Machines
•
Build
more
complex
models
that
can
deal
with
occlusions
or
structured
noise.
Gaussian
RBM,
modeling
Binary
RBM
modeling
clean
faces
occlusions
Inferred
Binary
pixel-‐wise
Gaussian
noise
Mask
Observed
Relates
to
Le
Roux,
Heess,
Shopon,
and
Winn,
Neural
Computa:on,
2011
Eslami,
Heess,
Winn,
CVPR
2012
Tang
et.
al.,
CVPR
2012
53. Robust
Boltzmann
Machines
Internal
States
of
RoBM
during
learning.
Inference
on
the
test
subjects
Comparing
to
Other
Denoising
Algorithms
54. Spoken
Query
Detec:on
• 630
speaker
TIMIT
corpus:
3,696
training
and
944
test
uperances.
• 10
query
keywords
were
randomly
selected
and
10
examples
of
each
keyword
were
extracted
from
the
training
set.
• Goal:
For
each
keyword,
rank
all
944
uperances
based
on
the
uperance’s
probability
of
containing
that
keyword.
• Performance
measure:
The
average
equal
error
rate
(EER).
13.5
Learning
Algorithm
AVG
EER
13
GMM
Unsupervised
16.4%
12.5
12
DBM
Unsupervised
14.7%
Avg. EER 11.5
DBM
(1%
labels)
13.3%
11
DBM
(30%
labels)
10.5%
10.5
DBM
(100%
labels)
9.7%
10
9.5
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Training Ratio
(Yaodong
Zhang
et.al.
ICASSP
2012)!
55. Learning
Hierarchical
Representa:ons
Deep
Boltzmann
Machines:
Learning
Hierarchical
Structure
in
Features:
edges,
combina:on
of
edges.
•
Performs
well
in
many
applica:on
domains
•
Combines
bopom
and
top-‐down
•
Fast
Inference:
frac:on
of
a
second
•
Learning
scales
to
millions
of
examples
Many
examples,
few
categories
Next:
Few
examples,
many
categories
–
Transfer
Learning
56. Talk
Roadmap
DBM
• Unsupervised
Feature
Learning
– Restricted
Boltzmann
Machines
– Deep
Belief
Networks
– Deep
Boltzmann
Machines
• Transfer
Learning
with
Deep
Models
• Mul:modal
Learning
“animal”
“vehicle”
horse
cow
car
van
truck
57. One-‐shot
Learning
“zarc” “segway”
How
can
we
learn
a
novel
concept
–
a
high
dimensional
sta:s:cal
object
–
from
few
examples.
58. Learning
from
Few
Examples
SUN
database
car
van
truck
bus
Classes
sorted
by
frequency
Rare
objects
are
similar
to
frequent
objects
60. Learning
to
Transfer
Background
Knowledge
Millions
of
unlabeled
images
Learn
to
Transfer
Knowledge
Some
labeled
images
Learn
novel
concept
from
one
example
Bicycle
Dolphin
Test:
What
is
this?
Elephant
Tractor
61. Learning
to
Transfer
Background
Knowledge
Millions
of
unlabeled
images
Learn
to
Transfer
Knowledge
Key
problem
in
computer
vision,
speech
percep:on,
natural
language
processing,
and
many
other
domains.
Some
labeled
images
Learn
novel
concept
from
one
example
Bicycle
Dolphin
Test:
What
is
this?
Elephant
Tractor
62. One-‐Shot
Learning
Level 3
{τ 0, α0}
Hierarchical
Bayesian
Level 2
{µk, τ k, αk}
Level 1
Animal Vehicle
...
Models
{µc, τ c}
Cow Horse Sheep Truck Car
Hierarchical
Prior.
Probability of observed Prior probability of
data given parameters weight vector W
Posterior probability of
parameters given the
training data D.
•
Fei-‐Fei,
Fergus,
and
Perona,
TPAMI
2006
•
E.
Bart,
I.
Porteous,
P.
Perona,
and
M.
Welling,
CVPR
2007
•
Miller,
Matsakis,
and
Viola,
CVPR
2000
•
Sivic,
Russell,
Zisserman,
Freeman,
and
Efros,
CVPR
2008
63. Hierarchical-‐Deep
Models
HD
Models:
Compose
hierarchical
Bayesian
Deep
Nets
models
with
deep
networks,
two
influen:al
Part-‐based
Hierarchy
approaches
from
unsupervised
learning
Deep
Networks:
•
learn
mul:ple
layers
of
nonlineari;es.
•
trained
in
unsupervised
fashion
-‐-‐
unsupervised
feature
learning
–
no
need
to
Marr
and
Nishihara
(1978)
rely
on
human-‐crabed
input
representa:ons.
•
labeled
data
is
used
to
slightly
adjust
the
Hierarchical
Bayes
model
for
a
specific
task.
Category-‐based
Hierarchy
Hierarchical
Bayes:
•
explicitly
represent
category
hierarchies
for
sharing
abstract
knowledge.
•
explicitly
iden:fy
only
a
small
number
of
parameters
that
are
relevant
to
the
new
concept
being
learned.
Collins
&
Quillian
(1969)
64. Mo:va:on
Learning
to
transfer
knowledge:
Super-‐class
Hierarchical
•
Super-‐category:
“A
segway
looks
Segway
like
a
funny
kind
of
vehicle”.
•
Higher-‐level
features,
or
parts,
shared
with
other
classes:
Parts
Ø
wheel,
handle,
post
•
Lower-‐level
features:
Ø
edges,
composi:on
of
edges
Deep
Edges
65. Hierarchical
Genera:ve
Model
Hierarchical
Latent
Dirichlet
Alloca:on
Model
“animal”
“vehicle”
K
horse
cow
car
van
truck
DBM
Model
Lower-‐level
generic
features:
•
edges,
combina:on
of
edges
Images
(Salakhutdinov, Tenenbaum, Torralba, 2011)!
66. Hierarchical
Genera:ve
Model
Hierarchical
Latent
Dirichlet
Alloca:on
Model
“animal”
“vehicle”
Hierarchical
Organiza;on
of
Categories:
•
express
priors
on
the
features
that
are
typical
of
different
kinds
of
concepts
•
modular
data-‐parameter
rela:ons
K
Higher-‐level
class-‐sensi;ve
features:
horse
cow
car
van
truck
•
capture
dis:nc:ve
perceptual
structure
of
a
specific
concept
DBM
Model
Lower-‐level
generic
features:
•
edges,
combina:on
of
edges
Images
(Salakhutdinov, Tenenbaum, Torralba, 2011)!
67. Intui:on
α Pr(topic
|
doc)
π
Words
⇔
ac:va:ons
of
DBM’s
top-‐level
units.
z
Topics
⇔
distribu:ons
over
top-‐level
units,
or
θ
higher-‐level
parts.
w
Pr(word
|
topic)
DBM
generic
features:
LDA
high-‐level
features:
Images
Words
Topics
Documents
Each
topic
is
made
up
of
words.
Each
document
is
made
up
of
topics.
69. Hierarchical
Deep
Model
Tree
hierarchy
of
classes
is
learned
“animal”
“vehicle”
(Nested
Chinese
Restaurant
Process)
prior:
a
nonparametric
prior
over
tree
structures.
K
Topics
horse
cow
car
van
truck
70. Hierarchical
Deep
Model
Tree
hierarchy
of
classes
is
learned
“animal”
“vehicle”
(Nested
Chinese
Restaurant
Process)
prior:
a
nonparametric
prior
over
tree
structures.
(Hierarchical
Dirichlet
Process)
prior:
K a
nonparametric
prior
allowing
categories
to
Topics
share
higher-‐level
features,
or
parts.
horse
cow
car
van
truck
71. Hierarchical
Deep
Model
Tree
hierarchy
of
classes
is
learned
“animal”
“vehicle”
Unlike
standard
sta:s:cal
models,
(Nested
Chinese
Restaurant
Process)
in
addi:on
ntonparametric
prior
arameters,
prior:
a
structures
o
inferring
p over
tree
we
also
infer
the
hierarchy
for
(Hierarchical
Dirichlet
Process)
prior:
sharing
nonparametric
prior
allowing
categories
to
K a
those
parameters.
Topics
share
higher-‐level
features,
or
parts.
Condi;onal
Deep
Boltzmann
horse
cow
car
van
truck
Machine.
Enforce
(approximate)
global
consistency
through
many
local
constraints.
72. CIFAR
Object
Recogni:on
Tree
hierarchy
of
classes
is
learned
50,000
images
of
100
classes
“animal”
“vehicle”
Higher-‐level
class
Inference:
Markov
chain
sensi:ve
features
Monte
Carlo
–
Later!
horse
cow
car
van
truck
4
million
unlabeled
images
Lower-‐level
generic
features
32
x
32
pixels
x
3
RGB
73. Learning
to
Learn
The
model
learns
how
to
share
the
knowledge
across
many
visual
categories.
Learned
super-‐
“global”
class
hierarchy
“aqua;c
animal”
“fruit”
“human”
dolphin
turtle
shark
ray
apple
orange
sunflower
girl
baby
man
Basic
level
class
woman
…
Learned
higher-‐level
class-‐sensi;ve
features
…
Learned
low-‐level
generic
features
74. Learning
to
Learn
The
model
learns
how
to
share
the
knowledge
across
many
visual
categories.
“global”
Learned
super-‐
crocodile
spider
class
hierarchy
“aqua;c
snake
lizard
“fruit”
“human”
castle
road
animal”
squirrel
bridge
kangaroo
skyscraper
bus
house
leopard
dolphin
fox
turtle
shark
ray
apple
orange
sunflower
truck
Basic
level
train
;ger
girl
baby
man
tank
lion
wolf
tractor
streetcar
class
oMer
skunk
woman
shrew
porcupine
pine
dolphin
oak
maple
tree
bear
ray
whale
shark
…
Learned
higher-‐level
willow
tree
camel
turtle
boMle
elephant
caMle
class-‐sensi;ve
can
lamp
bowl
cup
chimpanzee
beaver
features
mouse
raccoon
Learned
low-‐level
…
apple
peer
pepper
man
boy
man
generic
features
hamster
rabbit
possum
orange
sunflower
girl
woman
75. Sharing
Features
“fruit”
Reconst-‐
Learning
to
Real
ruc:ons
Shape
Color
Learn
Dolphin
Sunflower
Orange
Apple
apple
orange
sunflower
Sunflower
ROC
curve
5 3
1
ex’s
1
0.9
0.8
0.7
detection rate
0.6
0.5
0.4
Pixel-‐
0.3
0.2
space
0.1 distance
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
false alarm rate
Learning
to
Learn:
Learning
a
hierarchy
for
sharing
parameters
–
rapid
learning
of
a
novel
concept.
76. Object
Recogni:on
Area under ROC curve for same/different
(1 new class vs. 99 distractor classes)
1
HDP-‐DBM
LDA
DBM
HDP-‐DBM
0.95
GIST
(class
condi:onal)
(no
super-‐classes)
0.9
0.85
0.8
0.75
0.7
0.65
1
3
5
10
50
#
examples
[Averaged
over
40
test
classes]
Our
model
outperforms
standard
computer
vision
features
(e.g.
GIST).
77. Handwripen
Character
Recogni:on
“alphabet
1”
“alphabet
2”
Learned
higher-‐level
features
Strokes
char
1
char
2
char
3
char
4
char
5
Learned
lower-‐
level
features
25,000
Edge
characters
s
78. Handwripen
Character
Recogni:on
Area under ROC curve for same/different
(1 new class vs. 1000 distractor classes)
HDP-‐DBM
1
HDP-‐DBM
LDA
DBM
(no
super-‐classes)
0.95
(class
condi:onal)
0.9
Pixels
0.85
0.8
0.75
0.7
0.65
1
3
5
10
[Averaged
over
40
test
classes]
#
examples
79. Simula:ng
New
Characters
Real
data
within
super
class
Global
Super
Super
class
1
class
2
Class
1
Class
2
New
class
Simulated
new
characters
80. Simula:ng
New
Characters
Real
data
within
super
class
Global
Super
Super
class
1
class
2
Class
1
Class
2
New
class
Simulated
new
characters
81. Simula:ng
New
Characters
Real
data
within
super
class
Global
Super
Super
class
1
class
2
Class
1
Class
2
New
class
Simulated
new
characters
82. Simula:ng
New
Characters
Real
data
within
super
class
Global
Super
Super
class
1
class
2
Class
1
Class
2
New
class
Simulated
new
characters
83. Simula:ng
New
Characters
Real
data
within
super
class
Global
Super
Super
class
1
class
2
Class
1
Class
2
New
class
Simulated
new
characters
84. Simula:ng
New
Characters
Real
data
within
super
class
Global
Super
Super
class
1
class
2
Class
1
Class
2
New
class
Simulated
new
characters
85. Simula:ng
New
Characters
Real
data
within
super
class
Global
Super
Super
class
1
class
2
Class
1
Class
2
New
class
Simulated
new
characters
86. Learning from very few examples
3
examples
of
a
new
class
Condi:onal
samples
in
the
same
class
Inferred
super-‐class
95. Hierarchical-‐Deep
So
far
we
have
considered
directed
+
undirected
models.
Deep
Lamber:an
Networks
“animal”
“vehicle”
Deep
Undirected
K
Topics
horse
cow
car
van
truck
Directed
Combines
the
elegant
proper:es
of
the
Lamber:an
model
with
the
Gaussian
RBMs
Low-‐level
features:
(and
Deep
Belief
Nets,
Deep
Boltzmann
replace
GIST,
SIFT
Machines).
Tang
et.
al.,
ICML
2012
96. Deep
Lamber:an
Networks
Model
Specifics
Image
Surface
Light
Deep
Lamber:an
Net
Normals
albedo
source
Inferred
Observed
Inference:
Gibbs
sampler.
Learning:
Stochas:c
Approxima:on
97. Deep
Lamber:an
Networks
Yale
B
Extended
Database
One
Test
Image
Two
Test
Images
Face
Religh:ng
98. Deep
Lamber:an
Networks
Recogni:on
as
func:on
of
the
number
of
training
images
for
10
test
subjects.
One-‐Shot
Recogni:on
99. Recursive
Neural
Networks
Recursive
structure
learning
Local
recursive
networks
are
making
predic:ons
whether
to
merge
the
two
inputs
as
well
as
predic:ng
the
label.
Use
Max-‐Margin
Es:ma:on.
Socher
et.
al.,
ICML
2011
101. Learning
from
Few
Examples
SUN
database
car
van
truck
bus
Classes
sorted
by
frequency
Rare
objects
are
similar
to
frequent
objects
102. Learning
from
Few
Examples
chair
armchair
Swivel
chair
Deck
chair
Classes
sorted
by
frequency
103. Genera:ve
Model
of
Classifier
Parameters
Many
state-‐of-‐the-‐art
object
detec:on
systems
use
sophis:cated
models,
based
on
mul:ple
parts
with
separate
appearance
and
shape
components.
Detect
objects
by
tes:ng
sub-‐windows
and
scoring
corresponding
test
patches
with
a
linear
func:on.
Define
hierarchical
prior
over
parameters
of
discrimina;ve
model
and
learn
the
hierarchy.
Image
Specific:
concatena:on
of
the
HOG
feature
pyramid
at
mul:ple
scales.
Felzenszwalb,
McAllester
&
Ramanan,
2008
104. Genera:ve
Model
of
Classifier
Parameters
Hierarchical
Bayes
By
learning
hierarchical
structure,
we
can
improve
the
current
state-‐of-‐the-‐art.
Sun
Dataset:
32,855
examples
of
200
categories
185
ex
27
ex
12
ex
Hierarchical
Model
Single
Class
105. Talk
Roadmap
• Unsupervised
Feature
Learning
– Restricted
Boltzmann
Machines
DBM
– Deep
Belief
Networks
– Deep
Boltzmann
Machines
• Transfer
Learning
with
Deep
Models
RBM
• Mul:modal
Learning
106. Mul:-‐Modal
Input
Learning
systems
that
combine
mul:ple
input
domains
Images
Text
&
Language
Video
Laser
scans
Speech
&
Audio
Time
series
data
Develop
learning
systems
that
come
One
of
Key
Challenges:
closer
to
displaying
human
like
intelligence
Inference
107. Mul:-‐Modal
Input
Learning
systems
that
combine
mul:ple
input
domains
Image
Text
More
robust
percep:on.
Ngiam
et.al.,
ICML
2011
used
deep
autoencoders
(video
+
speech)
•
Guillaumin,
Verbeek,
and
Schmid,
CVPR
2011
•
Huiskes,
Thomee,
and
Lew,
Mul:media
Informa:on
Retrieval,
2010
•
Xing,
Yan,
and
Hauptmann,
UAI
2005.
108. Training
Data
pentax,
k10d,
camera,
jahdakine,
kangarooisland
lightpain:ng,
southaustralia,
sa
reflec:on
australia
doublepaneglass
australiansealion
300mm
wowiekazowie
sandbanks,
lake,
lakeontario,
sunset,
top20buperflies
walking,
beach,
purple,
sky,
water,
clouds,
overtheexcellence
mickikrimmel,
mickipedia,
headshot
<no
text>
Samples
from
the
MIR
Flickr
Dataset
-‐
Crea:ve
Commons
License
109. Mul:-‐Modal
Input
• Improve
Classifica:on
pentax,
k10d,
kangarooisland
southaustralia,
sa
australia
SEA
/
NOT
SEA
australiansealion
300mm
• Fill
in
Missing
Modali:es
beach,
sea,
surf,
strand,
shore,
wave,
seascape,
sand,
ocean,
waves
• Retrieve
data
from
one
modality
when
queried
using
data
from
another
modality
beach,
sea,
surf,
strand,
shore,
wave,
seascape,
sand,
ocean,
waves