Tera-scale deep learning                 Quoc	  V.	  Le	       Stanford	  University	  and	  Google	  
Joint	  work	  with	          Kai	  Chen	       Greg	  Corrado	           Jeff	  Dean	   MaQhieu	  Devin	  Rajat	  Monga	  ...
Machine	  Learning	  successes	  Face	  recogniNon	           OCR	          Autonomous	  car	                             ...
Feature	  ExtracNon	                                                 Classifier	          Feature	  extracNon	     (Mostly	...
Hand-­‐Cra]ed	  Features	                             Computer	  vision:	                             	                   ...
New	  feature-­‐designing	  paradigm	  Unsupervised	  Feature	  Learning	  /	  Deep	  Learning	  	  	  ReconstrucNon	  ICA...
The	  Trend	  of	  BigData	  
Outline	  No	  maQer	  the	  algorithm,	  more	  features	  always	  more	  successful.	                	                -...
Topographic	  Independent	  Component	  Analysis	  (TICA)	  
1.	  Feature	  computaNon	                                                                              2	               2...
Topographic	  Independent	  Component	  Analysis	  (TICA)	  
Invariance	  explained	                        Images	                   Image1	                                          ...
TICA:	                                                             ReconstrucNon	  ICA:	                   Equivalence	  b...
ReconstrucNon	  ICA:	  Le,	  et	  al.,	  ICA	  with	  Reconstruc1on	  Cost	  for	  Efficient	  Overcomplete	  Feature	  Lear...
ReconstrucNon	  ICA:	                                                                                                Data	...
TICA:	                                                        ReconstrucNon	  ICA:	                                       ...
Why	  RICA?	            Algorithms	                      Speed	             Ease	  of	  training	        Invariant	  Featu...
Summary	  of	  RICA	  -­‐  Two-­‐layered	  network	  -­‐  ReconstrucNon	  cost	  instead	  of	  orthogonality	  constraint...
ApplicaNons	  of	  RICA	  
AcNon	  recogniNon	              Sit	  up	                                    Drive	  Car	                      Get	  	  O...
Le,	  et	  al.,	  Learning	  hierarchical	  spa1o-­‐temporal	  features	  for	  	  ac1on	  recogni1on	  with	  independent...
94	                                                                                                                       ...
Cancer	  classificaNon	                                                                                     92%	     Apopto...
Scaling	  up	  	  deep	  RICA	  networks	  
Scaling	  up	  Deep	  Learning	  Deep	  learning	  data	                                          Real	  data	  
It’s	  beQer	  to	  have	  more	  features!	       No	  maQer	  the	  algorithm,	  more	  features	  always	  more	  succe...
Most	  are	  	  local	  features	  
Local	  recepNve	  field	  networks	                                Machine	  #1	                Machine	  #2	            M...
Challenges	  with	  1000s	  of	  machines	  
Asynchronous	  Parallel	  SGDs	                                                          Parameter	  server	  Le,	  et	  a...
Asynchronous	  Parallel	  SGDs	                                                          Parameter	  server	  Le,	  et	  a...
Summary	  of	  Scaling	  up	    -­‐  Local	  connecNvity	    -­‐  Asynchronous	  SGDs	    	                               ...
10	  million	  200x200	  images	  	      1	  billion	  parameters	  
Training	               RICA	                      Dataset:	  10	  million	  200x200	  unlabeled	  images	  	  from	  YouT...
The	  face	  neuron	             Top	  sNmuli	  from	  the	  test	  set	                                          OpNmal	 ...
Random	  distractors	                                                                       Faces	                Frequenc...
Feature	  response	                                              Invariance	  properNes	                                  ...
Top	  sNmuli	  from	  the	  test	  set	                                       OpNmal	  sNmulus	  	                        ...
Random	  distractors	                                                                           Pedestrians	              ...
Top	  sNmuli	  from	  the	  test	  set	                                        OpNmal	  sNmulus	  	                       ...
Random	  distractors	                                                                              Cat	  faces	           ...
ImageNet	  classificaNon	               22,000	  categories	               	               14,000,000	  images	            ...
22,000	  is	  a	  lot	  of	  categories…	  	  …	  smoothhound,	  smoothhound	  shark,	  Mustelus	  mustelus	  American	  s...
Best	  sNmuli	       Feature	  1	       Feature	  2	       Feature	  3	       Feature	  4	       Feature	  5	  Le,	  et	  ...
Best	  sNmuli	       Feature	  6	       Feature	  7	  	       Feature	  8	       Feature	  9	  Le,	  et	  al.,	  Building	...
Best	  sNmuli	       Feature	  10	       Feature	  11	  	       Feature	  12	       Feature	  13	  Le,	  et	  al.,	  Build...
0.005%	   9.5%	                                                                                         ?	                ...
0.005%	   9.5%	   15.8%	                  Random	  guess	                                                                 ...
Other	  results	  No	  maQer	  the	  algorithm,	  more	  features	  always	  more	  successful.	           	           -­‐...
Conclusions	    •  RICA	  learns	  invariant	  features	    •  Face	  neuron	  with	  totally	  unlabeled	  data	  	      ...
References	  •  Q.V.	  Le,	  M.A.	  Ranzato,	  R.	  Monga,	  M.	  Devin,	  G.	  Corrado,	  K.	  Chen,	  J.	  Dean,	  A.Y.	...
Quoc le   tera-scale deep learning
Upcoming SlideShare
Loading in …5
×

Quoc le tera-scale deep learning

1,330 views
1,128 views

Published on

Published in: Education, Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,330
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
22
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Quoc le tera-scale deep learning

  1. 1. Tera-scale deep learning Quoc  V.  Le   Stanford  University  and  Google  
  2. 2. Joint  work  with   Kai  Chen   Greg  Corrado   Jeff  Dean   MaQhieu  Devin  Rajat  Monga   Andrew  Ng   Marc Aurelio   Paul  Tucker   Ke  Yang   Ranzato   Samy  Bengio,  Zhenghao  Chen,  Tom  Dean,  Pangwei  Koh,   AddiNonal   Mark  Mao,  Jiquan  Ngiam,  Patrick  Nguyen,  Andrew  Saxe,   Thanks:   Mark  Segal,  Jon  Shlens,    Vincent  Vanhouke,    Xiaoyun  Wu,     Peng  Xe,  Serena  Yeung,  Will  Zou  
  3. 3. Machine  Learning  successes  Face  recogniNon   OCR   Autonomous  car   Email  classificaNon   RecommendaNon  systems   Web  page  ranking  
  4. 4. Feature  ExtracNon   Classifier   Feature  extracNon   (Mostly  hand-­‐cra]ed  features)  
  5. 5. Hand-­‐Cra]ed  Features   Computer  vision:       …   SIFT/HOG   SURF   Speech  RecogniNon:       …  MFCC   Spectrogram   ZCR  
  6. 6. New  feature-­‐designing  paradigm  Unsupervised  Feature  Learning  /  Deep  Learning      ReconstrucNon  ICA    Expensive  and  typically  applied  to  small  problems  
  7. 7. The  Trend  of  BigData  
  8. 8. Outline  No  maQer  the  algorithm,  more  features  always  more  successful.     -­‐        ReconstrucNon  ICA   -­‐  ApplicaNons  to  videos,  cancer  images   -­‐  Ideas  for  scaling  up   -­‐  Scaling  up  Results  
  9. 9. Topographic  Independent  Component  Analysis  (TICA)  
  10. 10. 1.  Feature  computaNon   2   2.  Learning   2   (   W  T   (   W  T   )   9   )   1   W  T  W  T   9   1   W   W   9   1   W   1   W  =     W   2   .   .   W   10000   Input  data:  
  11. 11. Topographic  Independent  Component  Analysis  (TICA)  
  12. 12. Invariance  explained   Images   Image1   Image2   Features   Loc1   Loc2   1   0   F1   F2   0   1  Pooled  feature  of  F1  and  F2   sqrt(1  +  02  )  =  1   2     sqrt(0  2  +  12  )  =  1       Same  value  regardless  the  locaNon  of  the  edge  
  13. 13. TICA:   ReconstrucNon  ICA:   Equivalence  between  Sparse  Coding,  Autoencoders,  RBMs  and  ICA   Build  deep  architecture  by  treaNng  the  output  of  one  layer  as  input  to   another  layer  Le,  et  al.,  ICA  with  Reconstruc1on  Cost  for  Efficient  Overcomplete  Feature  Learning.  NIPS  2011  
  14. 14. ReconstrucNon  ICA:  Le,  et  al.,  ICA  with  Reconstruc1on  Cost  for  Efficient  Overcomplete  Feature  Learning.  NIPS  2011  
  15. 15. ReconstrucNon  ICA:   Data  whitening  Le,  et  al.,  ICA  with  Reconstruc1on  Cost  for  Efficient  Overcomplete  Feature  Learning.  NIPS  2011  
  16. 16. TICA:   ReconstrucNon  ICA:   Data  whitening  Le,  et  al.,  ICA  with  Reconstruc1on  Cost  for  Efficient  Overcomplete  Feature  Learning.  NIPS  2011  
  17. 17. Why  RICA?   Algorithms   Speed   Ease  of  training   Invariant  Features     Sparse  Coding   RBMs/Autoencoders   TICA   ReconstrucNon  ICA  Le,  et  al.,  ICA  with  Reconstruc1on  Cost  for  Efficient  Overcomplete  Feature  Learning.  NIPS  2011  
  18. 18. Summary  of  RICA  -­‐  Two-­‐layered  network  -­‐  ReconstrucNon  cost  instead  of  orthogonality  constraints  -­‐  Learns  invariant  features    
  19. 19. ApplicaNons  of  RICA  
  20. 20. AcNon  recogniNon   Sit  up   Drive  Car   Get    Out  of  Car   Eat   Answer  phone   Kiss   Run   Stand  up   Shake  hands  Le,  et  al.,  Learning  hierarchical  spa1o-­‐temporal  features  for    ac1on  recogni1on  with  independent  subspace  analysis.  CVPR  2011  
  21. 21. Le,  et  al.,  Learning  hierarchical  spa1o-­‐temporal  features  for    ac1on  recogni1on  with  independent  subspace  analysis.  CVPR  2011  
  22. 22. 94   55   KTH   53   Hollywood2  92   51  90   49  88   47   45  86   43  84   41   39  82   37  80   35   Hessian/SURF   pLSA   HOF   GRBMs   3DCNN   HMAX   HOG   Hessian/SURF   HOG/HOF   HOG3D   GRBMS   HOF   Learned  Features   Learned  Features  87   76   UCF   YouTube  85   75  83   74  81   73  79   72  77   71  75   70   Hessian/SURF   HOG   Hessian   HOF   HOG3D   Combined   Learned  Features   HOG.HOF   Learned  Features   Engineered  Features  Le,  et  al.,  Learning  hierarchical  spa1o-­‐temporal  features  for    ac1on  recogni1on  with  independent  subspace  analysis.  CVPR  2011  
  23. 23. Cancer  classificaNon   92%   ApoptoNc   90%  Viable  tumor   88%   region   86%   Necrosis   84%   Hand  engineered  Features   RICA   …  Le,  et  al.,  Learning  Invariant  Features  of  Tumor  Signatures.  ISBI  2012  
  24. 24. Scaling  up    deep  RICA  networks  
  25. 25. Scaling  up  Deep  Learning  Deep  learning  data   Real  data  
  26. 26. It’s  beQer  to  have  more  features!   No  maQer  the  algorithm,  more  features  always  more  successful.  Coates,  et  al.,  An  Analysis  of  Single-­‐Layer  Networks  in  Unsupervised  Feature  Learning.  AISTATS’11  
  27. 27. Most  are    local  features  
  28. 28. Local  recepNve  field  networks   Machine  #1   Machine  #2   Machine  #3   Machine  #4   RICA  features   Image  Le,  et  al.,  Tiled  Convolu1onal  Neural  Networks.  NIPS  2010  
  29. 29. Challenges  with  1000s  of  machines  
  30. 30. Asynchronous  Parallel  SGDs   Parameter  server  Le,  et  al.,  Building  high-­‐level  features  using  large-­‐scale  unsupervised  learning.  ICML  2012  
  31. 31. Asynchronous  Parallel  SGDs   Parameter  server  Le,  et  al.,  Building  high-­‐level  features  using  large-­‐scale  unsupervised  learning.  ICML  2012  
  32. 32. Summary  of  Scaling  up   -­‐  Local  connecNvity   -­‐  Asynchronous  SGDs     …  And  more  -­‐  RPC  vs  MapReduce  -­‐  Prefetching  -­‐  Single  vs  Double  -­‐  Removing  slow  machines  -­‐  OpNmized  So]max  -­‐  …  
  33. 33. 10  million  200x200  images     1  billion  parameters  
  34. 34. Training   RICA   Dataset:  10  million  200x200  unlabeled  images    from  YouTube/Web     Train  on  2000  machines  (16000  cores)  for  1  week     RICA   1.15  billion  parameters   -­‐  100x  larger  than  previously  reported     -­‐  Small  compared  to  visual  cortex     RICA   Image  Le,  et  al.,  Building  high-­‐level  features  using  large-­‐scale  unsupervised  learning.  ICML  2012  
  35. 35. The  face  neuron   Top  sNmuli  from  the  test  set   OpNmal  sNmulus     by  numerical  opNmizaNon  Le,  et  al.,  Building  high-­‐level  features  using  large-­‐scale  unsupervised  learning.  ICML  2012  
  36. 36. Random  distractors   Faces   Frequency   Feature  value  Le,  et  al.,  Building  high-­‐level  features  using  large-­‐scale  unsupervised  learning.  ICML  2012  
  37. 37. Feature  response   Invariance  properNes   Feature  response   0  pixels   20  pixels   0  pixels   20  pixels   Horizontal  shi]s   VerNcal  shi]s   Feature  response   Feature  response   o   o   0   90   0.4x   1x   1.6x   3D  rotaNon  angle   Scale  factor  Le,  et  al.,  Building  high-­‐level  features  using  large-­‐scale  unsupervised  learning.  ICML  2012  
  38. 38. Top  sNmuli  from  the  test  set   OpNmal  sNmulus     by  numerical  opNmizaNon  Le,  et  al.,  Building  high-­‐level  features  using  large-­‐scale  unsupervised  learning.  ICML  2012  
  39. 39. Random  distractors   Pedestrians   Frequency   Feature  value  Le,  et  al.,  Building  high-­‐level  features  using  large-­‐scale  unsupervised  learning.  ICML  2012  
  40. 40. Top  sNmuli  from  the  test  set   OpNmal  sNmulus     by  numerical  opNmizaNon  Le,  et  al.,  Building  high-­‐level  features  using  large-­‐scale  unsupervised  learning.  ICML  2012  
  41. 41. Random  distractors   Cat  faces   Frequency   Feature  value  Le,  et  al.,  Building  high-­‐level  features  using  large-­‐scale  unsupervised  learning.  ICML  2012  
  42. 42. ImageNet  classificaNon   22,000  categories     14,000,000  images     Hand-­‐engineered  features  (SIFT,  HOG,  LBP),     SpaNal  pyramid,    SparseCoding/Compression    Le,  et  al.,  Building  high-­‐level  features  using  large-­‐scale  unsupervised  learning.  ICML  2012  
  43. 43. 22,000  is  a  lot  of  categories…    …  smoothhound,  smoothhound  shark,  Mustelus  mustelus  American  smooth  dogfish,  Mustelus  canis  Florida  smoothhound,  Mustelus  norrisi  whiteNp  shark,  reef  whiteNp  shark,  Triaenodon  obseus  AtlanNc  spiny  dogfish,  Squalus  acanthias  Pacific  spiny  dogfish,  Squalus  suckleyi   SNngray  hammerhead,  hammerhead  shark  smooth  hammerhead,  Sphyrna  zygaena  smalleye  hammerhead,  Sphyrna  tudes  shovelhead,  bonnethead,  bonnet  shark,  Sphyrna  Nburo  angel  shark,  angelfish,  SquaNna  squaNna,  monkfish  electric  ray,  crampfish,  numbfish,  torpedo   Mantaray  smalltooth  sawfish,  PrisNs  pecNnatus  guitarfish  roughtail  sNngray,  DasyaNs  centroura  buQerfly  ray  eagle  ray  spoQed  eagle  ray,  spoQed  ray,  Aetobatus  narinari  cownose  ray,  cow-­‐nosed  ray,  Rhinoptera  bonasus  manta,  manta  ray,  devilfish  AtlanNc  manta,  Manta  birostris  devil  ray,  Mobula  hypostoma  grey  skate,  gray  skate,  Raja  baNs  liQle  skate,  Raja  erinacea  …  
  44. 44. Best  sNmuli   Feature  1   Feature  2   Feature  3   Feature  4   Feature  5  Le,  et  al.,  Building  high-­‐level  features  using  large-­‐scale  unsupervised  learning.  ICML  2012  
  45. 45. Best  sNmuli   Feature  6   Feature  7     Feature  8   Feature  9  Le,  et  al.,  Building  high-­‐level  features  using  large-­‐scale  unsupervised  learning.  ICML  2012  
  46. 46. Best  sNmuli   Feature  10   Feature  11     Feature  12   Feature  13  Le,  et  al.,  Building  high-­‐level  features  using  large-­‐scale  unsupervised  learning.  ICML  2012  
  47. 47. 0.005%   9.5%   ?   Random  guess   State-­‐of-­‐the-­‐art   Feature  learning     (Weston,  Bengio  ‘11)   From  raw  pixels  Le,  et  al.,  Building  high-­‐level  features  using  large-­‐scale  unsupervised  learning.  ICML  2012  
  48. 48. 0.005%   9.5%   15.8%   Random  guess   State-­‐of-­‐the-­‐art   Feature  learning     (Weston,  Bengio  ‘11)   From  raw  pixels   ImageNet  2009  (10k  categories):  Best  published  result:  17%                                                                                                                        (Sanchez  &  Perronnin  ‘11  ),                                                                                                                        Our  method:  20%     Using  only  1000  categories,  our  method  >  50%    Le,  et  al.,  Building  high-­‐level  features  using  large-­‐scale  unsupervised  learning.  ICML  2012  
  49. 49. Other  results  No  maQer  the  algorithm,  more  features  always  more  successful.     -­‐  We  also  have  great  features  for     -­‐  Speech  recogniNon   -­‐  Word-­‐vector  embedding  for  NLPs  
  50. 50. Conclusions   •  RICA  learns  invariant  features   •  Face  neuron  with  totally  unlabeled  data     ImageNet                  with  enough  training  and  data   •  State-­‐of-­‐the-­‐art  performances  on     0.005%   9.5%   15.8%   Random  guess   Best  published  result   Our  method   –  AcNon  RecogniNon   –  Cancer  image  classificaNon   –  ImageNet   94   92   90   88   86   84   82   80  Cancer  classificaNon   AcNon  recogniNon   AcNon  recogniNon  benchmarks   Feature  visualizaNon   Face  neuron  
  51. 51. References  •  Q.V.  Le,  M.A.  Ranzato,  R.  Monga,  M.  Devin,  G.  Corrado,  K.  Chen,  J.  Dean,  A.Y.   Ng.  Building  high-­‐level  features  using  large-­‐scale  unsupervised  learning.   ICML,  2012.  •  Q.V.  Le,  J.  Ngiam,  Z.  Chen,  D.  Chia,  P.  Koh,  A.Y.  Ng.  Tiled  Convolu8onal  Neural   Networks.  NIPS,  2010.    •  Q.V.  Le,  W.Y.  Zou,  S.Y.  Yeung,  A.Y.  Ng.  Learning  hierarchical  spa8o-­‐temporal   features  for  ac8on  recogni8on  with  independent  subspace  analysis.  CVPR,   2011.  •  Q.V.  Le,  J.  Ngiam,  A.  Coates,  A.  Lahiri,  B.  Prochnow,  A.Y.  Ng.     On  op8miza8on  methods  for  deep  learning.  ICML,  2011.    •  Q.V.  Le,  A.  Karpenko,  J.  Ngiam,  A.Y.  Ng.    ICA  with  Reconstruc8on  Cost  for   Efficient  Overcomplete  Feature  Learning.  NIPS,  2011.    •  Q.V.  Le,  J.  Han,  J.  Gray,  P.  Spellman,  A.  Borowsky,  B.  Parvin.  Learning  Invariant   Features  for  Tumor  Signatures.  ISBI,  2012.    •  I.J.  Goodfellow,  Q.V.  Le,  A.M.  Saxe,  H.  Lee,  A.Y.  Ng,    Measuring  invariances  in   deep  networks.  NIPS,  2009.   hQp://ai.stanford.edu/~quocle  

×