Deep	  Learning	  	            Russ	  Salakhutdinov	  Department of Statistics and Computer Science!            University...
Mining	  for	  Structure	      Massive	  increase	  in	  both	  computa:onal	  power	  and	  the	  amount	  of	      data	...
Talk	  Roadmap	                    •  Unsupervised	  Feature	  Learning	                        –  Restricted	  Boltzmann	...
Restricted	  Boltzmann	  Machines	                           	  	  hidden	  variables	                                    ...
Restricted	  Boltzmann	  Machines	                           	  	  hidden	  variables	                                    ...
Model	  Learning	                            	  	  hidden	  variables	                                                    ...
Model	  Learning	                            	  	  hidden	  variables	                                                    ...
RBMs	  for	  Images	  Gaussian-­‐Bernoulli	  RBM:	  	                                                                 Defin...
RBMs	  for	  Images	  Gaussian-­‐Bernoulli	  RBM:	  	                                                                 Inte...
RBMs	  for	  Images	  and	  Text	  Images:	  Gaussian-­‐Bernoulli	  RBM	                                                  ...
Collabora:ve	  Filtering	       Bernoulli	  hidden:	  user	  preferences	           h                                     ...
Mul:ple	  Applica:on	  Domains	  •    Natural	  Images	  •    Text/Documents	  •    Collabora:ve	  Filtering	  /	  Matrix	...
Talk	  Roadmap	                    •  Unsupervised	  Feature	  Learning	                        –  Restricted	  Boltzmann	...
Deep	  Belief	  Network	  	                          Low-­‐level	  features:	                          Edges	             ...
Deep	  Belief	  Network	                                         Unsupervised	  feature	  learning.	                      ...
Deep	  Belief	  Network	                                                  The	  joint	  probability	       Deep	  Belief	 ...
DBNs	  for	  Classifica:on	                2000                    W3               500                              Softma...
Deep	  Autoencoders	                                              B-=4>-,    6$         !*              )4C   ($$         ...
Deep	  Genera:ve	  Model	  Model	  P(document)	                    Reuters	  dataset:	  804,414	  	                       ...
Informa:on	  Retrieval	                   Interbank Markets                                      European Community       ...
Seman:c	  Hashing	                                                         European Community                             ...
Searching	  Large	  Image	  Database	                    using	  Binary	  Codes	  • 	  Map	  images	  into	  binary	  code...
Talk	  Roadmap	                    •  Unsupervised	  Feature	  Learning	                        –  Restricted	  Boltzmann	...
DBNs	  vs.	  DBMs	           Deep Belief Network!                               Deep Boltzmann Machine!         h3        ...
Mathema:cal	  Formula:on	  Deep	  Boltzmann	  Machine	                                           model	  parameters	  h3  ...
Mathema:cal	  Formula:on	                                        Neural	  Network	  	  Deep	  Boltzmann	  Machine	        ...
Mathema:cal	  Formula:on	                                        Neural	  Network	  	  Deep	  Boltzmann	  Machine	        ...
Mathema:cal	  Formula:on	  Deep	  Boltzmann	  Machine	                                     model	  parameters	  h3        ...
Previous	  Work	  Many	  approaches	  for	  learning	  Boltzmann	  machines	  have	  been	  proposed	  over	  the	  last	 ...
New	  Learning	  Algorithm	     Posterior	  Inference	                    Simulate	  from	  the	  Model	  Condi:onal	     ...
New	  Learning	  Algorithm	     Posterior	  Inference	                               Simulate	  from	  the	  Model	  Condi...
New	  Learning	  Algorithm	        Posterior	  Inference	                                      Simulate	  from	  the	  Mod...
Sampling	  from	  DBMs	    Sampling	  from	  two-­‐hidden	  layer	  DBM:	  by	  running	  Markov	  chain:	  Randomly	  ini...
Stochas:c	  Approxima:on	  Time	  	  t=1	                                                     t=2	                        ...
Stochas:c	  Approxima:on	  Update	  rule	  decomposes:	                                         True	  gradient	          ...
Varia:onal	  Inference	         Approximate	  intractable	  distribu:on	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  ...
Varia:onal	  Inference	      Approximate	  intractable	  distribu:on	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  wit...
Varia:onal	  Inference	     Approximate	  intractable	  distribu:on	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  with...
Good	  Genera:ve	  Model?	  Handwripen	  Characters	  
Good	  Genera:ve	  Model?	  Handwripen	  Characters	  
Good	  Genera:ve	  Model?	  Handwripen	  Characters	          Simulated	            Real	  Data	  
Good	  Genera:ve	  Model?	  Handwripen	  Characters	          Real	  Data	         Simulated	  
Good	  Genera:ve	  Model?	  Handwripen	  Characters	  
Good	  Genera:ve	  Model?	  MNIST	  Handwripen	  Digit	  Dataset	  
Deep	  Boltzmann	  Machine	   Sanskrit	                Model	  P(image)	                        25,000	  characters	  from...
Deep	  Boltzmann	  Machine	                                     Condi:onal	                                     Simula:on	...
Handwri:ng	  Recogni:on	          MNIST	  Dataset	                               Op:cal	  Character	  Recogni:on	     60,0...
Deep	  Boltzmann	  Machine	                                                                  Gaussian-­‐Bernoulli	  Markov...
Genera:ve	  Model	  of	  3-­‐D	  Objects	  24,000	  examples,	  5	  object	  categories,	  5	  different	  objects	  within...
3-­‐D	  Object	  Recogni:on	                                                                Papern	  Comple:on	   Learning...
Learning	  Part-­‐based	  Hierarchy	                           Object	  parts.	                           Combina:on	  of	...
Robust	  Boltzmann	  Machines	  • 	  Build	  more	  complex	  models	  that	  can	  deal	  with	  occlusions	  or	  struct...
Robust	  Boltzmann	  Machines	                                      Internal	  States	  of	                               ...
Spoken	  Query	  Detec:on	  •  630	  speaker	  TIMIT	  corpus:	  3,696	  training	  and	  944	  test	  uperances.	  •  10	...
Learning	  Hierarchical	  Representa:ons	  Deep	  Boltzmann	  Machines:	  	    Learning	  Hierarchical	  Structure	  	    ...
Talk	  Roadmap	   DBM	                                                                      •  Unsupervised	  Feature	  Le...
One-­‐shot	  Learning	                      “zarc”                              “segway”How	  can	  we	  learn	  a	  novel...
Learning	  from	  Few	  Examples	  SUN	  database	                                       car	                             ...
Tradi:onal	  Supervised	  Learning	      Segway	                    Motorcycle	                         Test:	  	         ...
Learning	  to	  Transfer	   Background	  Knowledge	  Millions	  of	  unlabeled	  images	  	                               ...
Learning	  to	  Transfer	   Background	  Knowledge	  Millions	  of	  unlabeled	  images	  	                               ...
One-­‐Shot	  Learning	                                 Level 3                              {τ 0, α0}                     ...
Hierarchical-­‐Deep	  Models	  HD	  Models:	  Compose	  hierarchical	  Bayesian	                                  Deep	  N...
Mo:va:on	        Learning	  to	  transfer	  knowledge:	  	                                                                ...
Hierarchical	  Genera:ve	  Model	                                              Hierarchical	  Latent	  Dirichlet	         ...
Hierarchical	  Genera:ve	  Model	                                              Hierarchical	  Latent	  Dirichlet	         ...
Intui:on	                             α Pr(topic	  |	  doc)	                                                              ...
Hierarchical	  Deep	  Model	  “animal”	                                        “vehicle”	                                 ...
Hierarchical	  Deep	  Model	                                                Tree	  hierarchy	  of	  	                     ...
Hierarchical	  Deep	  Model	                                                Tree	  hierarchy	  of	  	                     ...
Hierarchical	  Deep	  Model	                                                Tree	  hierarchy	  of	  	                     ...
CIFAR	  Object	  Recogni:on	                                                             Tree	  hierarchy	  of	  	        ...
Learning	  to	  Learn	       The	  model	  learns	  how	  to	  share	  the	  knowledge	  across	  many	  visual	       cat...
Learning	  to	  Learn	      The	  model	  learns	  how	  to	  share	  the	  knowledge	  across	  many	  visual	      categ...
Sharing	  Features	                                                                            “fruit”	                   ...
Object	  Recogni:on	  	    Area under ROC curve for same/different    (1 new class vs. 99 distractor classes)      1	     ...
Handwripen	  Character	  Recogni:on	  “alphabet	  1”	                                                                     ...
Handwripen	  Character	  Recogni:on	        Area under ROC curve for same/different        (1 new class vs. 1000 distracto...
Simula:ng	  New	  Characters	                                                              Real	  data	  within	  super	  ...
Simula:ng	  New	  Characters	                                                              Real	  data	  within	  super	  ...
Simula:ng	  New	  Characters	                                                              Real	  data	  within	  super	  ...
Simula:ng	  New	  Characters	                                                              Real	  data	  within	  super	  ...
Simula:ng	  New	  Characters	                                                              Real	  data	  within	  super	  ...
Simula:ng	  New	  Characters	                                                              Real	  data	  within	  super	  ...
Simula:ng	  New	  Characters	                                                              Real	  data	  within	  super	  ...
Learning from very few examples  3	  examples	  of	    	  	  a	  new	  class	  Condi:onal	  samples	  in	  the	  same	  cl...
Learning from very few examples
Learning from very few examples
Learning from very few examples
Learning from very few examples
Learning from very few examples
Learning from very few examples
Learning from very few examples
Learning from very few examples
Hierarchical-­‐Deep	  	     So	  far	  we	  have	  considered	  directed	  +	  undirected	  models.	                      ...
Deep	  Lamber:an	  Networks	  Model	  Specifics	                                                          Image	      Surfa...
Deep	  Lamber:an	  Networks	  Yale	  B	  Extended	  Database	  One	  Test	  Image	     Two	  Test	  Images	     Face	  Rel...
Deep	  Lamber:an	  Networks	    Recogni:on	  as	  func:on	  of	  the	  number	  of	  training	  images	  for	  10	  test	 ...
Recursive	  Neural	  Networks	  Recursive	  structure	  learning	  	                                               Local	 ...
Recursive	  Neural	  Networks	  Recursive	  structure	  learning	  	                                               Socher	...
Learning	  from	  Few	  Examples	  SUN	  database	                                       car	                             ...
Learning	  from	  Few	  Examples	              chair	                                 armchair	                           ...
Genera:ve	  Model	  of	  Classifier	                      Parameters	  Many	  state-­‐of-­‐the-­‐art	  object	  detec:on	  ...
Genera:ve	  Model	  of	  Classifier	                              Parameters	                                              ...
Talk	  Roadmap	                    •  Unsupervised	  Feature	  Learning	                        –  Restricted	  Boltzmann	...
Mul:-­‐Modal	  Input	   Learning	  systems	  that	  combine	  mul:ple	  input	  domains	         Images	                  ...
Mul:-­‐Modal	  Input	  Learning	  systems	  that	  combine	  mul:ple	  input	  domains	                                   ...
Training	  Data	  pentax,	  k10d,	                                                                         camera,	  jahda...
P05 deep boltzmann machines cvpr2012 deep learning methods for vision
P05 deep boltzmann machines cvpr2012 deep learning methods for vision
P05 deep boltzmann machines cvpr2012 deep learning methods for vision
P05 deep boltzmann machines cvpr2012 deep learning methods for vision
P05 deep boltzmann machines cvpr2012 deep learning methods for vision
P05 deep boltzmann machines cvpr2012 deep learning methods for vision
Upcoming SlideShare
Loading in...5
×

P05 deep boltzmann machines cvpr2012 deep learning methods for vision

1,956

Published on

Published in: Education, Technology
0 Comments
5 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,956
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
92
Comments
0
Likes
5
Embeds 0
No embeds

No notes for slide

P05 deep boltzmann machines cvpr2012 deep learning methods for vision

  1. 1. Deep  Learning     Russ  Salakhutdinov  Department of Statistics and Computer Science! University of Toronto  
  2. 2. Mining  for  Structure   Massive  increase  in  both  computa:onal  power  and  the  amount  of   data  available  from  web,  video  cameras,  laboratory  measurements.   Images  &  Video   Text  &  Language     Speech  &  Audio   Gene  Expression   Rela:onal  Data/     Climate  Change  Product     Social  Network   Geological  Data  Recommenda:on   Mostly  Unlabeled  •   Develop  sta:s:cal  models  that  can  discover  underlying  structure,  cause,  or  sta:s:cal  correla:on  from  data  in  unsupervised  or  semi-­‐supervised  way.    •   Mul:ple  applica:on  domains.  
  3. 3. Talk  Roadmap   •  Unsupervised  Feature  Learning   –  Restricted  Boltzmann  Machines   DBM   –  Deep  Belief  Networks   –  Deep  Boltzmann  Machines   •  Transfer  Learning  with  Deep   Models  RBM   •  Mul:modal  Learning  
  4. 4. Restricted  Boltzmann  Machines      hidden  variables   Bipar:te     Stochas:c  binary  visible  variables   Structure   are  connected  to  stochas:c  binary  hidden   variables                  .     The  energy  of  the  joint  configura:on:    Image            visible  variables   model  parameters.   Probability  of  the  joint  configura:on  is  given  by  the  Boltzmann  distribu:on:   par::on  func:on   poten:al  func:ons   Markov  random  fields,  Boltzmann  machines,  log-­‐linear  models.  
  5. 5. Restricted  Boltzmann  Machines      hidden  variables   Bipar:te     Structure   Restricted:      No  interac:on  between          hidden  variables   Inferring  the  distribu:on  over  the  Image            visible  variables   hidden  variables  is  easy:   Factorizes:  Easy  to  compute   Similarly:   Markov  random  fields,  Boltzmann  machines,  log-­‐linear  models.  
  6. 6. Model  Learning      hidden  variables   Given  a  set  of  i.i.d.  training  examples                                      ,  we  want  to  learn     model  parameters                .         Maximize  (penalized)  log-­‐likelihood  objec:ve:   Image            visible  variables  Deriva:ve  of  the  log-­‐likelihood:   Regulariza:on     Difficult  to  compute:  exponen:ally  many     configura:ons  
  7. 7. Model  Learning      hidden  variables   Given  a  set  of  i.i.d.  training  examples                                      ,  we  want  to  learn     model  parameters                .         Maximize  (penalized)  log-­‐likelihood  objec:ve:   Image            visible  variables  Deriva:ve  of  the  log-­‐likelihood:   Approximate  maximum  likelihood  learning:     Contras:ve  Divergence  (Hinton  2000)          Pseudo  Likelihood  (Besag  1977)     MCMC-­‐MLE  es:mator  (Geyer  1991)              Composite  Likelihoods  (Lindsay,  1988;  Varin  2008)     Tempered  MCMC                    Adap:ve  MCMC     (Salakhutdinov,  NIPS  2009)              (Salakhutdinov,  ICML  2010)  
  8. 8. RBMs  for  Images  Gaussian-­‐Bernoulli  RBM:     Define  energy  func:ons  for                                various  data  modali:es:   Image            visible  variables   Gaussian   Bernoulli  
  9. 9. RBMs  for  Images  Gaussian-­‐Bernoulli  RBM:     Interpreta:on:  Mixture  of  exponen:al   number  of  Gaussians   Image            visible  variables    where    is  an  implicit  prior,  and     Gaussian  
  10. 10. RBMs  for  Images  and  Text  Images:  Gaussian-­‐Bernoulli  RBM   Learned  features  (out  of  10,000)   4  million  unlabelled  images  Text:  Mul:nomial-­‐Bernoulli  RBM   Learned  features:  ``topics’’   russian   clinton   computer   trade   stock   Reuters  dataset:   russia   house   system   country   wall   804,414  unlabeled   moscow   president   product   import   street   newswire  stories   yeltsin   bill   sobware   world   point   soviet   congress   develop   economy   dow   Bag-­‐of-­‐Words    
  11. 11. Collabora:ve  Filtering   Bernoulli  hidden:  user  preferences   h Learned  features:  ``genre’’   W1 Fahrenheit  9/11   Independence  Day  v Bowling  for  Columbine   The  Day  Aber  Tomorrow   The  People  vs.  Larry  Flynt   Con  Air   Mul:nomial  visible:  user  ra:ngs   Canadian  Bacon   Men  in  Black  II   La  Dolce  Vita   Men  in  Black   Neglix  dataset:     480,189  users     Friday  the  13th   Scary  Movie   The  Texas  Chainsaw  Massacre   Naked  Gun     17,770  movies     Children  of  the  Corn   Hot  Shots!   Over  100  million  ra:ngs   Childs  Play   American  Pie     The  Return  of  Michael  Myers   Police  Academy   State-­‐of-­‐the-­‐art  performance     on  the  Neglix  dataset.     Relates  to  Probabilis;c  Matrix  Factoriza;on   (Salakhutdinov & Mnih ICML 2007)!
  12. 12. Mul:ple  Applica:on  Domains  •  Natural  Images  •  Text/Documents  •  Collabora:ve  Filtering  /  Matrix  Factoriza:on    •  Video  (Langford  et  al.  ICML  2009  ,  Lee  et  al.)    •  Mo:on  Capture  (Taylor  et.al.  NIPS  2007)  •  Speech  Percep:on  (Dahl  et.  al.  NIPS  2010,  Lee  et.al.  NIPS  2010)   Same  learning  algorithm  -­‐-­‐                          mul:ple  input  domains. Limita:ons  on  the  types  of  structure  that  can  be   represented  by  a  single  layer  of  low-­‐level  features!  
  13. 13. Talk  Roadmap   •  Unsupervised  Feature  Learning   –  Restricted  Boltzmann  Machines   DBM   –  Deep  Belief  Networks   –  Deep  Boltzmann  Machines   •  Transfer  Learning  with  Deep   Models  RBM   •  Mul:modal  Learning  
  14. 14. Deep  Belief  Network     Low-­‐level  features:   Edges   Built  from  unlabeled  inputs.     Input:  Pixels  Image  
  15. 15. Deep  Belief  Network   Unsupervised  feature  learning.   Internal  representa:ons  capture   higher-­‐order  sta:s:cal  structure   Higher-­‐level  features:   Combina:on  of  edges   Low-­‐level  features:   Edges   Built  from  unlabeled  inputs.     Input:  Pixels  Image   (Hinton et.al. Neural Computation 2006)!
  16. 16. Deep  Belief  Network   The  joint  probability   Deep  Belief  Network   distribu:on  factorizes:  h3 W3 RBM  h2 W2 Sigmoid  Belief     RBM   Sigmoid    h1 Belief     Network   Network   W1v
  17. 17. DBNs  for  Classifica:on   2000 W3 500 Softmax Output RBM 10 10 500 WT 4 WT+ 4 4 W2 2000 2000 500 WT 3 WT+ 3 3 RBM 500 500 WT 2 WT+ 2 2 500 500 500 W1 WT 1 WT+ 1 1 RBM Pretraining Unrolling Fine tuning•   Aber  layer-­‐by-­‐layer  unsupervised  pretraining,  discrimina:ve  fine-­‐tuning    by  backpropaga:on  achieves  an  error  rate  of  1.2%  on  MNIST.  SVM’s  get  1.4%  and  randomly  ini:alized  backprop  gets  1.6%.    •   Clearly  unsupervised  learning  helps  generaliza:on.  It  ensures  that  most  of  the  informa:on  in  the  weights  comes  from  modeling  the  input  data.  
  18. 18. Deep  Autoencoders   B-=4>-, 6$ !* )4C ($$ %& ) ) !" !" ! ; #$$$ #$$$ ) ) ($$ !# !# ! : !6 "$$$ "$$$ ) ) "$$$ !6 !6 ! 9 %& ($$ ($$ ) ) !* !* ! ( 6$ ?4>-@5/A-, 6$ "$$$ !* !* ! * !# ($$ ($$ #$$$ %& !6 !6 ! 6 "$$$ "$$$ !# !# ! # #$$$ #$$$ #$$$ !" !" !" ! " %& <1=4>-,+,-.,/01012 31,455012 701- .81012
  19. 19. Deep  Genera:ve  Model  Model  P(document)   Reuters  dataset:  804,414     newswire  stories:  unsupervised   European Community Interbank Markets Monetary/Economic Energy Markets Disasters and Accidents Leading Legal/Judicial Economic Indicators Bag  of  words   Government Accounts/ Borrowings Earnings (Hinton & Salakhutdinov, Science 2006)!
  20. 20. Informa:on  Retrieval   Interbank Markets European Community Monetary/Economic 2-­‐D  LSA  space   Energy Markets Disasters and Accidents Leading Legal/Judicial Economic Indicators Government Accounts/ Borrowings Earnings•   The  Reuters  Corpus  Volume  II  contains  804,414  newswire  stories  (randomly  split  into  402,207  training  and  402,207  test).  •   “Bag-­‐of-­‐words”:  each  ar:cle  is  represented  as  a  vector  containing  the  counts  of  the  most  frequently  used  2000  words  in  the  training  set.  
  21. 21. Seman:c  Hashing   European Community Monetary/Economic Address Space Disasters and Accidents Semantically Similar DocumentsSemanticHashing GovernmentFunction Energy Markets Borrowing Document Accounts/Earnings •   Learn  to  map  documents  into  seman;c  20-­‐D  binary  codes.   •   Retrieve  similar  documents  stored  at  the  nearby  addresses  with  no   search  at  all.  
  22. 22. Searching  Large  Image  Database   using  Binary  Codes  •   Map  images  into  binary  codes  for  fast  retrieval.  •   Small  Codes,  Torralba,  Fergus,  Weiss,  CVPR  2008  •   Spectral  Hashing,  Y.  Weiss,  A.  Torralba,  R.  Fergus,  NIPS  2008  •   Kulis  and  Darrell,  NIPS  2009,  Gong  and  Lazebnik,  CVPR  20111  •   Norouzi  and  Fleet,  ICML  2011,    
  23. 23. Talk  Roadmap   •  Unsupervised  Feature  Learning   –  Restricted  Boltzmann  Machines   DBM   –  Deep  Belief  Networks   –  Deep  Boltzmann  Machines   •  Transfer  Learning  with  Deep   Models  RBM   •  Mul:modal  Learning  
  24. 24. DBNs  vs.  DBMs   Deep Belief Network! Deep Boltzmann Machine! h3 h3 W3 W3 h2 h2 W2 W2 h1 h1 W1 W1 v vDBNs  are  hybrid  models:     •   Inference  in  DBNs  is  problema:c  due  to  explaining  away.   •   Only  greedy  pretrainig,  no  joint  op;miza;on  over  all  layers.     •   Approximate  inference  is  feed-­‐forward:  no  boMom-­‐up  and  top-­‐down.        Introduce  a  new  class  of  models  called  Deep  Boltzmann  Machines.  
  25. 25. Mathema:cal  Formula:on  Deep  Boltzmann  Machine   model  parameters  h3 •  Dependencies  between  hidden  variables.   W3 •  All  connec:ons  are  undirected.  h2 •  Bopom-­‐up  and  Top-­‐down:   W2h1 W1 v Top-­‐down   Bopom-­‐up   Input   Unlike  many  exis:ng  feed-­‐forward  models:  ConvNet  (LeCun),   HMAX  (Poggio  et.al.),  Deep  Belief  Nets  (Hinton  et.al.)  
  26. 26. Mathema:cal  Formula:on   Neural  Network    Deep  Boltzmann  Machine   Deep  Belief  Network   Output  h3 h3 h3 W3 W3 W3h2 h2 h2 W2 W2 W2h1 h1 h1 W1 W1 W1 v v v Input   Unlike  many  exis:ng  feed-­‐forward  models:  ConvNet  (LeCun),   HMAX  (Poggio),  Deep  Belief  Nets  (Hinton)  
  27. 27. Mathema:cal  Formula:on   Neural  Network    Deep  Boltzmann  Machine   Deep  Belief  Network   Output  h3 h3 h3 W3 W3 W3h2 h2 h2 inference   W2 W2 W2h1 h1 h1 W1 W1 W1 v v v Input   Unlike  many  exis:ng  feed-­‐forward  models:  ConvNet  (LeCun),   HMAX  (Poggio),  Deep  Belief  Nets  (Hinton)  
  28. 28. Mathema:cal  Formula:on  Deep  Boltzmann  Machine   model  parameters  h3 •  Dependencies  between  hidden  variables.   W3 Maximum  likelihood  learning:  h2 W2h1 Problem:  Both  expecta:ons  are   W1 intractable!   v Learning  rule  for  undirected  graphical  models:     MRFs,  CRFs,  Factor  graphs.    
  29. 29. Previous  Work  Many  approaches  for  learning  Boltzmann  machines  have  been  proposed  over  the  last  20  years:   •   Hinton  and  Sejnowski  (1983),   •   Peterson  and  Anderson  (1987)   •   Galland  (1991)     Real-­‐world  applica:ons  –  thousands     •   Kappen  and  Rodriguez  (1998)   of  hidden  and  observed  variables   •   Lawrence,  Bishop,  and  Jordan  (1998)   •   Tanaka  (1998)     with  millions  of  parameters.   •   Welling  and  Hinton  (2002)     •   Zhu  and  Liu  (2002)   •   Welling  and  Teh  (2003)   •   Yasuda  and  Tanaka  (2009)      Many  of  the  previous  approaches  were  not  successful  for  learning  general  Boltzmann  machines  with  hidden  variables.  Algorithms  based  on  Contras:ve  Divergence,  Score  Matching,  Pseudo-­‐Likelihood,  Composite  Likelihood,  MCMC-­‐MLE,  Piecewise  Learning,  cannot  handle  mul:ple  layers  of  hidden  variables.      
  30. 30. New  Learning  Algorithm   Posterior  Inference   Simulate  from  the  Model  Condi:onal   Uncondi:onal   Approximate   Approximate  the   condi:onal   joint  distribu:on   (Salakhutdinov, 2008; NIPS 2009)!
  31. 31. New  Learning  Algorithm   Posterior  Inference   Simulate  from  the  Model  Condi:onal   Uncondi:onal   Approximate   Approximate  the   condi:onal   joint  distribu:on   Data-­‐dependent   Data-­‐independent   density   Match    
  32. 32. New  Learning  Algorithm   Posterior  Inference   Simulate  from  the  Model   Condi:onal   Uncondi:onal   Approximate   Approximate  the   condi:onal   joint  distribu:on   Markov  Chain  Mean-­‐Field   Monte  Carlo   Data-­‐dependent   Data-­‐independent   Match     Key  Idea  of  Our  Approach:   Data-­‐dependent:          Varia;onal  Inference,  mean-­‐field  theory   Data-­‐independent:    Stochas;c  Approxima;on,  MCMC  based  
  33. 33. Sampling  from  DBMs   Sampling  from  two-­‐hidden  layer  DBM:  by  running  Markov  chain:  Randomly  ini:alize   …   Sample  
  34. 34. Stochas:c  Approxima:on  Time    t=1   t=2   t=3   h2 h2 h2 Update   Update   h1 h1 h1 v v v Update            and            sequen:ally,    where   •  Generate                    by  simula:ng  from  a  Markov  chain   that  leaves                  invariant  (e.g.  Gibbs  or  M-­‐H  sampler)   •  Update            by  replacing  intractable                      with  a  point   es:mate     In  prac:ce  we  simulate  several  Markov  chains  in  parallel.   Robbins  and  Monro,  Ann.  Math.  Stats,  1957    L.  Younes,    Probability  Theory  1989,  Tieleman,  ICML  2008.    
  35. 35. Stochas:c  Approxima:on  Update  rule  decomposes:   True  gradient   Noise  term  Almost  sure  convergence  guarantees  as  learning  rate       Problem:  High-­‐dimensional  data:   Markov  Chain   the  energy  landscape  is  highly     Monte  Carlo   mul:modal   Key  insight:  The  transi:on  operator  can  be     any  valid  transi:on  operator  –  Tempered     Transi:ons,  Parallel/Simulated  Tempering.   Connec:ons  to  the  theory  of  stochas:c  approxima:on  and  adap:ve  MCMC.  
  36. 36. Varia:onal  Inference   Approximate  intractable  distribu:on                                  with  simpler,  tractable   distribu:on            :   Posterior  Inference   Mean-­‐Field   Varia:onal  Lower  Bound  Minimize  KL  between  approxima:ng  and  true  distribu:ons  with  respect  to  varia:onal  parameters          .     (Salakhutdinov & Larochelle, AI & Statistics 2010)!
  37. 37. Varia:onal  Inference   Approximate  intractable  distribu:on                                  with  simpler,  tractable   distribu:on            :  Posterior  Inference   Varia:onal  Lower  Bound   Mean-­‐Field:  Choose  a  fully  factorized  distribu:on:  Mean-­‐Field   with   Varia;onal  Inference:  Maximize  the  lower  bound  w.r.t.   Varia:onal  parameters          .     Nonlinear  fixed-­‐   point  equa:ons:  
  38. 38. Varia:onal  Inference   Approximate  intractable  distribu:on                                  with  simpler,  tractable   distribu:on            :  Posterior  Inference   Varia:onal  Lower  Bound   Uncondi:onal  Simula:on   Markov  Chain  Mean-­‐Field   Fast  Inference   1.  Varia;onal  Inference:  Maximize  the  lower     bound  w.r.t.  varia:onal  parameters             Monte  Carlo   2.  MCMC:  Apply  stochas:c  approxima:on     Learning  can  scale  to   to  update  model  parameters                               millions  of  examples   Almost  sure  convergence  guarantees  to  an  asympto:cally   stable  point.  
  39. 39. Good  Genera:ve  Model?  Handwripen  Characters  
  40. 40. Good  Genera:ve  Model?  Handwripen  Characters  
  41. 41. Good  Genera:ve  Model?  Handwripen  Characters   Simulated   Real  Data  
  42. 42. Good  Genera:ve  Model?  Handwripen  Characters   Real  Data   Simulated  
  43. 43. Good  Genera:ve  Model?  Handwripen  Characters  
  44. 44. Good  Genera:ve  Model?  MNIST  Handwripen  Digit  Dataset  
  45. 45. Deep  Boltzmann  Machine   Sanskrit   Model  P(image)   25,000  characters  from  50   alphabets  around  the  world.   •   3,000  hidden  variables   •   784    observed  variables        (28  by  28  images)   •   Over  2  million  parameters   Bernoulli  Markov  Random  Field  
  46. 46. Deep  Boltzmann  Machine   Condi:onal   Simula:on  P(image|par:al  image)   Bernoulli  Markov  Random  Field  
  47. 47. Handwri:ng  Recogni:on   MNIST  Dataset   Op:cal  Character  Recogni:on   60,000  examples  of  10  digits   42,152  examples  of  26  English  lepers     Learning  Algorithm   Error   Learning  Algorithm   Error   Logis:c  regression   12.0%   Logis:c  regression   22.14%   K-­‐NN     3.09%   K-­‐NN     18.92%   Neural  Net  (Plap  2005)   1.53%   Neural  Net   14.62%   SVM  (Decoste  et.al.  2002)   1.40%   SVM  (Larochelle  et.al.  2009)   9.70%   Deep  Autoencoder   1.40%   Deep  Autoencoder   10.05%   (Bengio  et.  al.  2007)     (Bengio  et.  al.  2007)     Deep  Belief  Net   1.20%   Deep  Belief  Net   9.68%   (Hinton  et.  al.  2006)     (Larochelle  et.  al.  2009)     DBM     0.95%   DBM   8.40%  Permuta:on-­‐invariant  version.  
  48. 48. Deep  Boltzmann  Machine   Gaussian-­‐Bernoulli  Markov   Deep  Boltzmann  Machine   Random  Field   12,000  Latent     Variables   Model  P(image)  Planes   96  by  96   images   24,000  Training  Images   Stereo  pair  
  49. 49. Genera:ve  Model  of  3-­‐D  Objects  24,000  examples,  5  object  categories,  5  different  objects  within  each    category,  6  lightning  condi:ons,  9  eleva:ons,  18  azimuths.      
  50. 50. 3-­‐D  Object  Recogni:on   Papern  Comple:on   Learning  Algorithm   Error   Logis:c  regression   22.5%   K-­‐NN  (LeCun  2004)   18.92%   SVM  (Bengio  &  LeCun    2007)   11.6%   Deep  Belief  Net  (Nair  &   9.0%   Hinton    2009)     DBM   7.2%  Permuta:on-­‐invariant  version.  
  51. 51. Learning  Part-­‐based  Hierarchy   Object  parts.   Combina:on  of  edges.   Trained  from  mul:ple  classes                     (cars,  faces,  motorbikes,  airplanes).   Lee  et.al.,  ICML  2009  
  52. 52. Robust  Boltzmann  Machines  •   Build  more  complex  models  that  can  deal  with  occlusions  or  structured  noise.     Gaussian  RBM,  modeling   Binary  RBM  modeling   clean  faces   occlusions   Inferred   Binary  pixel-­‐wise   Gaussian  noise   Mask   Observed  Relates  to  Le  Roux,  Heess,  Shopon,  and  Winn,  Neural  Computa:on,  2011  Eslami,  Heess,  Winn,  CVPR  2012   Tang  et.  al.,  CVPR  2012  
  53. 53. Robust  Boltzmann  Machines   Internal  States  of   RoBM  during   learning.     Inference  on  the  test   subjects   Comparing  to  Other   Denoising  Algorithms  
  54. 54. Spoken  Query  Detec:on  •  630  speaker  TIMIT  corpus:  3,696  training  and  944  test  uperances.  •  10  query  keywords  were  randomly  selected  and  10  examples  of   each  keyword  were  extracted  from  the  training  set.    •  Goal:  For  each  keyword,  rank  all  944  uperances  based  on  the   uperance’s  probability  of  containing  that  keyword.      •  Performance  measure:  The  average  equal  error  rate  (EER).     13.5 Learning  Algorithm   AVG  EER   13 GMM  Unsupervised     16.4%   12.5 12 DBM  Unsupervised     14.7%   Avg. EER 11.5 DBM  (1%  labels)   13.3%   11 DBM  (30%  labels)   10.5%   10.5 DBM  (100%  labels)   9.7%   10 9.5 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Training Ratio (Yaodong  Zhang  et.al.    ICASSP  2012)!
  55. 55. Learning  Hierarchical  Representa:ons  Deep  Boltzmann  Machines:     Learning  Hierarchical  Structure     in  Features:  edges,  combina:on     of  edges.     •   Performs  well  in  many  applica:on  domains   •   Combines  bopom  and  top-­‐down   •   Fast  Inference:  frac:on  of  a  second   •   Learning  scales  to  millions  of  examples   Many  examples,  few  categories  Next:  Few  examples,  many  categories  –  Transfer  Learning  
  56. 56. Talk  Roadmap   DBM   •  Unsupervised  Feature  Learning   –  Restricted  Boltzmann  Machines   –  Deep  Belief  Networks   –  Deep  Boltzmann  Machines   •  Transfer  Learning  with  Deep   Models   •  Mul:modal  Learning  “animal”   “vehicle”    horse    cow    car    van    truck  
  57. 57. One-­‐shot  Learning   “zarc” “segway”How  can  we  learn  a  novel  concept  –  a  high  dimensional  sta:s:cal  object  –  from  few  examples.      
  58. 58. Learning  from  Few  Examples  SUN  database   car   van   truck   bus   Classes  sorted  by  frequency   Rare  objects  are  similar  to  frequent  objects  
  59. 59. Tradi:onal  Supervised  Learning   Segway   Motorcycle   Test:     What  is  this?  
  60. 60. Learning  to  Transfer   Background  Knowledge  Millions  of  unlabeled  images     Learn  to  Transfer   Knowledge   Some  labeled  images   Learn  novel  concept   from  one  example   Bicycle   Dolphin   Test:     What  is  this?   Elephant   Tractor  
  61. 61. Learning  to  Transfer   Background  Knowledge  Millions  of  unlabeled  images     Learn  to  Transfer   Knowledge   Key  problem  in  computer  vision,     speech  percep:on,  natural  language   processing,  and  many  other  domains.   Some  labeled  images   Learn  novel  concept   from  one  example   Bicycle   Dolphin   Test:     What  is  this?   Elephant   Tractor  
  62. 62. One-­‐Shot  Learning   Level 3 {τ 0, α0} Hierarchical  Bayesian   Level 2 {µk, τ k, αk}Level 1 Animal Vehicle ... Models  {µc, τ c} Cow Horse Sheep Truck Car Hierarchical  Prior.   Probability of observed Prior probability of data given parameters weight vector W Posterior probability of parameters given the training data D.•   Fei-­‐Fei,  Fergus,  and  Perona,  TPAMI  2006  •   E.  Bart,  I.  Porteous,  P.  Perona,  and  M.  Welling,  CVPR  2007  •   Miller,  Matsakis,  and  Viola,  CVPR  2000  •   Sivic,  Russell,    Zisserman,  Freeman,  and  Efros,  CVPR  2008  
  63. 63. Hierarchical-­‐Deep  Models  HD  Models:  Compose  hierarchical  Bayesian   Deep  Nets  models  with  deep  networks,  two  influen:al   Part-­‐based  Hierarchy  approaches  from  unsupervised  learning  Deep  Networks:   •   learn  mul:ple  layers  of  nonlineari;es.   •   trained  in  unsupervised  fashion  -­‐-­‐   unsupervised  feature  learning  –  no  need  to   Marr  and  Nishihara  (1978)   rely  on  human-­‐crabed  input  representa:ons.   •   labeled  data  is  used  to  slightly  adjust  the   Hierarchical  Bayes   model  for  a  specific  task.   Category-­‐based  Hierarchy  Hierarchical  Bayes:  •   explicitly  represent  category  hierarchies  for  sharing  abstract  knowledge.  •   explicitly  iden:fy  only  a  small  number  of  parameters  that  are  relevant  to  the  new  concept  being  learned.   Collins  &  Quillian  (1969)  
  64. 64. Mo:va:on   Learning  to  transfer  knowledge:     Super-­‐class  Hierarchical   •   Super-­‐category:  “A  segway  looks   Segway   like  a  funny  kind  of  vehicle”.   •   Higher-­‐level  features,  or  parts,    shared   with  other  classes:   Parts   Ø   wheel,  handle,  post   •   Lower-­‐level  features:   Ø   edges,  composi:on  of  edges    Deep   Edges  
  65. 65. Hierarchical  Genera:ve  Model   Hierarchical  Latent  Dirichlet   Alloca:on  Model  “animal”   “vehicle”   K  horse    cow    car    van    truck   DBM  Model   Lower-­‐level  generic  features:     •   edges,  combina:on  of  edges   Images   (Salakhutdinov, Tenenbaum, Torralba, 2011)!
  66. 66. Hierarchical  Genera:ve  Model   Hierarchical  Latent  Dirichlet   Alloca:on  Model  “animal”   “vehicle”   Hierarchical  Organiza;on  of  Categories:   •   express  priors  on  the  features  that  are   typical  of  different  kinds  of  concepts   •   modular  data-­‐parameter  rela:ons         K Higher-­‐level  class-­‐sensi;ve  features:    horse    cow    car    van    truck   •   capture  dis:nc:ve  perceptual   structure  of  a  specific  concept   DBM  Model   Lower-­‐level  generic  features:     •   edges,  combina:on  of  edges   Images   (Salakhutdinov, Tenenbaum, Torralba, 2011)!
  67. 67. Intui:on   α Pr(topic  |  doc)   πWords  ⇔  ac:va:ons  of  DBM’s  top-­‐level  units.     zTopics  ⇔  distribu:ons  over  top-­‐level  units,  or   θhigher-­‐level  parts.     w Pr(word  |  topic)   DBM  generic  features:     LDA  high-­‐level  features:   Images   Words   Topics   Documents   Each  topic  is  made  up  of  words.   Each  document  is  made  up  of  topics.  
  68. 68. Hierarchical  Deep  Model  “animal”   “vehicle”   K Topics    horse    cow    car    van    truck  
  69. 69. Hierarchical  Deep  Model   Tree  hierarchy  of     classes  is  learned  “animal”   “vehicle”      (Nested  Chinese  Restaurant  Process)   prior:  a  nonparametric  prior  over  tree   structures.   K Topics    horse    cow    car    van    truck  
  70. 70. Hierarchical  Deep  Model   Tree  hierarchy  of     classes  is  learned  “animal”   “vehicle”      (Nested  Chinese  Restaurant  Process)   prior:  a  nonparametric  prior  over  tree   structures.      (Hierarchical  Dirichlet  Process)  prior:   K a  nonparametric  prior  allowing  categories  to   Topics   share  higher-­‐level  features,  or  parts.    horse    cow    car    van    truck  
  71. 71. Hierarchical  Deep  Model   Tree  hierarchy  of     classes  is  learned  “animal”   “vehicle”   Unlike  standard  sta:s:cal  models,        (Nested  Chinese  Restaurant  Process)   in  addi:on  ntonparametric  prior  arameters,   prior:  a   structures   o  inferring  p over  tree   we  also  infer  the  hierarchy  for      (Hierarchical  Dirichlet  Process)  prior:   sharing  nonparametric  prior  allowing  categories  to   K a   those  parameters.   Topics   share  higher-­‐level  features,  or  parts.   Condi;onal  Deep  Boltzmann    horse    cow    car    van    truck   Machine.   Enforce  (approximate)  global  consistency     through  many  local  constraints.    
  72. 72. CIFAR  Object  Recogni:on   Tree  hierarchy  of     classes  is  learned   50,000  images  of  100  classes  “animal”   “vehicle”   Higher-­‐level  class   Inference:  Markov  chain     sensi:ve  features     Monte  Carlo  –  Later!    horse    cow    car    van    truck   4  million  unlabeled  images   Lower-­‐level   generic    features     32  x  32  pixels  x  3    RGB  
  73. 73. Learning  to  Learn   The  model  learns  how  to  share  the  knowledge  across  many  visual   categories.   Learned  super-­‐ “global”   class  hierarchy   “aqua;c   animal”   “fruit”   “human”  dolphin   turtle   shark   ray   apple   orange   sunflower   girl   baby   man   Basic  level     class   woman   …   Learned  higher-­‐level   class-­‐sensi;ve  features   …   Learned  low-­‐level   generic  features  
  74. 74. Learning  to  Learn   The  model  learns  how  to  share  the  knowledge  across  many  visual   categories.   “global”   Learned  super-­‐crocodile   spider   class  hierarchy   “aqua;c   snake   lizard   “fruit”   “human”   castle   road   animal”   squirrel   bridge   kangaroo   skyscraper   bus   house   leopard   dolphin   fox   turtle   shark   ray   apple   orange   sunflower   truck   Basic  level     train   ;ger   girl   baby   man   tank   lion   wolf   tractor   streetcar   class   oMer   skunk   woman   shrew   porcupine   pine   dolphin   oak   maple  tree  bear   ray   whale   shark   …   Learned  higher-­‐level   willow  tree   camel   turtle   boMle   elephant   caMle   class-­‐sensi;ve   can   lamp   bowl   cup   chimpanzee   beaver   features   mouse   raccoon   Learned  low-­‐level   …   apple   peer   pepper   man   boy   man   generic  features   hamster   rabbit   possum   orange   sunflower   girl   woman  
  75. 75. Sharing  Features   “fruit”   Reconst-­‐   Learning  to   Real   ruc:ons   Shape   Color   Learn  Dolphin   Sunflower   Orange   Apple   apple   orange   sunflower   Sunflower  ROC  curve   5  3      1  ex’s     1 0.9 0.8 0.7 detection rate 0.6 0.5 0.4 Pixel-­‐ 0.3 0.2 space     0.1 distance   0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 false alarm rate Learning  to  Learn:  Learning  a  hierarchy  for  sharing  parameters  –                  rapid  learning  of  a  novel  concept.  
  76. 76. Object  Recogni:on     Area under ROC curve for same/different (1 new class vs. 99 distractor classes) 1   HDP-­‐DBM   LDA   DBM        HDP-­‐DBM  0.95   GIST   (class  condi:onal)   (no  super-­‐classes)  0.9  0.85  0.8  0.75  0.7  0.65   1   3   5   10   50   #  examples   [Averaged  over  40  test  classes]   Our  model  outperforms  standard  computer  vision     features  (e.g.  GIST).  
  77. 77. Handwripen  Character  Recogni:on  “alphabet  1”   “alphabet  2”   Learned   higher-­‐level     features   Strokes    char  1    char  2    char  3      char  4    char  5   Learned  lower-­‐ level  features   25,000     Edge characters   s  
  78. 78. Handwripen  Character  Recogni:on   Area under ROC curve for same/different (1 new class vs. 1000 distractor classes) HDP-­‐DBM   1   HDP-­‐DBM   LDA     DBM   (no  super-­‐classes)  0.95   (class  condi:onal)  0.9   Pixels  0.85  0.8  0.75  0.7  0.65   1   3   5   10   [Averaged  over  40  test  classes]   #  examples  
  79. 79. Simula:ng  New  Characters   Real  data  within  super  class   Global   Super   Super   class  1   class  2  Class  1   Class  2   New  class   Simulated  new  characters  
  80. 80. Simula:ng  New  Characters   Real  data  within  super  class   Global   Super   Super   class  1   class  2  Class  1   Class  2   New  class   Simulated  new  characters  
  81. 81. Simula:ng  New  Characters   Real  data  within  super  class   Global   Super   Super   class  1   class  2  Class  1   Class  2   New  class   Simulated  new  characters  
  82. 82. Simula:ng  New  Characters   Real  data  within  super  class   Global   Super   Super   class  1   class  2  Class  1   Class  2   New  class   Simulated  new  characters  
  83. 83. Simula:ng  New  Characters   Real  data  within  super  class   Global   Super   Super   class  1   class  2  Class  1   Class  2   New  class   Simulated  new  characters  
  84. 84. Simula:ng  New  Characters   Real  data  within  super  class   Global   Super   Super   class  1   class  2  Class  1   Class  2   New  class   Simulated  new  characters  
  85. 85. Simula:ng  New  Characters   Real  data  within  super  class   Global   Super   Super   class  1   class  2  Class  1   Class  2   New  class   Simulated  new  characters  
  86. 86. Learning from very few examples 3  examples  of      a  new  class  Condi:onal  samples  in  the  same  class  Inferred  super-­‐class  
  87. 87. Learning from very few examples
  88. 88. Learning from very few examples
  89. 89. Learning from very few examples
  90. 90. Learning from very few examples
  91. 91. Learning from very few examples
  92. 92. Learning from very few examples
  93. 93. Learning from very few examples
  94. 94. Learning from very few examples
  95. 95. Hierarchical-­‐Deep     So  far  we  have  considered  directed  +  undirected  models.   Deep  Lamber:an  Networks    “animal”   “vehicle”   Deep   Undirected   K Topics    horse    cow    car    van    truck   Directed   Combines  the  elegant  proper:es  of  the   Lamber:an  model  with  the  Gaussian  RBMs   Low-­‐level  features:   (and  Deep  Belief  Nets,  Deep  Boltzmann   replace  GIST,  SIFT   Machines).   Tang  et.  al.,  ICML  2012  
  96. 96. Deep  Lamber:an  Networks  Model  Specifics   Image   Surface     Light     Deep  Lamber:an  Net     Normals   albedo   source   Inferred   Observed   Inference:  Gibbs  sampler.   Learning:  Stochas:c  Approxima:on  
  97. 97. Deep  Lamber:an  Networks  Yale  B  Extended  Database  One  Test  Image   Two  Test  Images   Face  Religh:ng  
  98. 98. Deep  Lamber:an  Networks   Recogni:on  as  func:on  of  the  number  of  training  images  for  10  test   subjects.    One-­‐Shot  Recogni:on  
  99. 99. Recursive  Neural  Networks  Recursive  structure  learning     Local  recursive  networks  are   making  predic:ons  whether   to  merge  the  two  inputs  as   well  as  predic:ng  the  label.     Use  Max-­‐Margin  Es:ma:on.   Socher  et.  al.,  ICML  2011  
  100. 100. Recursive  Neural  Networks  Recursive  structure  learning     Socher  et.  al.,  ICML  2011  
  101. 101. Learning  from  Few  Examples  SUN  database   car   van   truck   bus   Classes  sorted  by  frequency   Rare  objects  are  similar  to  frequent  objects  
  102. 102. Learning  from  Few  Examples   chair   armchair   Swivel  chair   Deck  chair   Classes  sorted  by  frequency  
  103. 103. Genera:ve  Model  of  Classifier   Parameters  Many  state-­‐of-­‐the-­‐art  object  detec:on  systems  use  sophis:cated  models,  based  on  mul:ple  parts  with  separate  appearance  and  shape  components.   Detect  objects  by  tes:ng  sub-­‐windows  and   scoring  corresponding  test  patches  with  a  linear   func:on.   Define  hierarchical  prior  over  parameters   of  discrimina;ve  model  and  learn  the   hierarchy.     Image  Specific:  concatena:on  of  the     HOG  feature  pyramid  at  mul:ple  scales.   Felzenszwalb,  McAllester  &  Ramanan,  2008  
  104. 104. Genera:ve  Model  of  Classifier   Parameters   Hierarchical   Bayes   By  learning  hierarchical  structure,     we  can  improve  the  current     state-­‐of-­‐the-­‐art.     Sun  Dataset:  32,855  examples  of       200  categories   185  ex   27  ex   12  ex  Hierarchical  Model  Single  Class  
  105. 105. Talk  Roadmap   •  Unsupervised  Feature  Learning   –  Restricted  Boltzmann  Machines   DBM   –  Deep  Belief  Networks   –  Deep  Boltzmann  Machines   •  Transfer  Learning  with  Deep   Models  RBM   •  Mul:modal  Learning  
  106. 106. Mul:-­‐Modal  Input   Learning  systems  that  combine  mul:ple  input  domains   Images   Text  &  Language     Video   Laser  scans   Speech  &     Audio   Time  series     data  Develop  learning  systems  that  come     One  of  Key  Challenges:    closer  to  displaying  human  like  intelligence                                                      Inference  
  107. 107. Mul:-­‐Modal  Input  Learning  systems  that  combine  mul:ple  input  domains   Image   Text  More  robust  percep:on.    Ngiam  et.al.,  ICML  2011  used  deep  autoencoders  (video  +  speech)  •   Guillaumin,  Verbeek,  and  Schmid,  CVPR  2011  •   Huiskes,  Thomee,  and  Lew,  Mul:media  Informa:on  Retrieval,  2010    •   Xing,  Yan,  and  Hauptmann,  UAI  2005.    
  108. 108. Training  Data  pentax,  k10d,   camera,  jahdakine,  kangarooisland   lightpain:ng,  southaustralia,  sa   reflec:on  australia   doublepaneglass  australiansealion  300mm   wowiekazowie    sandbanks,  lake,  lakeontario,  sunset,   top20buperflies  walking,  beach,  purple,  sky,  water,  clouds,  overtheexcellence     mickikrimmel,   mickipedia,  headshot   <no  text>   Samples  from  the  MIR  Flickr  Dataset  -­‐  Crea:ve  Commons  License  
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×