Intro	
  to	
  Neural	
  Networks	
  
Dean	
  Wya2e	
  
Boulder	
  Data	
  Science	
  
@drwya2e	
  
June	
  9,	
  2016	
  
Neural	
  Networks	
  
•  AI	
  summer	
  is	
  here!	
  
•  In	
  the	
  last	
  year	
  NNs	
  
have	
  	
  
–  ConFnued	
  SOA	
  
advancements	
  in	
  
image	
  and	
  speech	
  
recogniFon	
  
–  Beaten	
  a	
  human	
  player	
  
in	
  Go	
  
–  Provided	
  some	
  
quanFficaFon	
  of	
  “art”	
  
	
  
About	
  me	
  
•  100,000,000,000	
  neurons	
  
•  10,000	
  dendriFc	
  inputs	
  per	
  
neuron	
  
•  1	
  electrical	
  output	
  
How	
  does	
  your	
  brain	
  work?	
  
One	
  simple	
  abstracFon	
  
Dendri'c	
  
input	
  
Synap'c	
  
weights	
  
Soma	
   Axonal	
  output	
  
Digression	
  into	
  regression	
  
•  Linear	
  regression	
  
•  LogisFc	
  regression	
  
How	
  to	
  learn	
  the	
  weights?	
  
•  If	
  we	
  know	
  what	
  output	
  should	
  look	
  like,	
  can	
  
compute	
  error	
  and	
  update	
  weights	
  to	
  minimize	
  it	
  
–  OpFmizaFon	
  problem,	
  typically	
  use	
  gradient	
  descent	
  
_	
   Correct	
  output	
  
	
  
Output	
  
Error	
  
Gradient	
  descent	
  
•  Given	
  a	
  cost	
  funcFon	
  
– MSE	
  
– Cross-­‐entropy	
  
– etc.	
  
•  Can	
  take	
  step	
  in	
  opposite	
  direcFon	
  of	
  cost	
  
gradient	
  by	
  compuFng	
  derivaFve	
  w.r.t.	
  
weights	
  
•  Scale	
  by	
  learning	
  rate	
  (Fny	
  step)	
  
A	
  brief	
  history	
  of	
  neural	
  networks:	
  
The	
  Perceptron	
  
x1	
   x2	
   y	
  
0	
   0	
   0	
  
0	
   1	
   0	
  
1	
   0	
   0	
  
1	
   1	
   1	
  
~1960:	
  “The	
  perceptron”	
  
Universal	
  funcFon	
  approximator	
  
AND	
  
A	
  brief	
  history	
  of	
  neural	
  networks:	
  
The	
  Perceptron	
  
~1960:	
  “The	
  perceptron”	
  
Universal	
  funcFon	
  approximator	
  
x1	
   x2	
   y	
  
0	
   0	
   0	
  
0	
   1	
   1	
  
1	
   0	
   1	
  
1	
   1	
   0	
  
…but	
  only	
  if	
  funcFon	
  is	
  linearly	
  separable	
  
XOR	
  
?	
  
A	
  brief	
  history	
  of	
  neural	
  networks:	
  
The	
  Perceptron	
  
•  Neural	
  network	
  research	
  halts	
  	
  
(AI	
  winter)	
  
•  Meanwhile…	
  
–  Support	
  Vector	
  Machine	
  (SVM)	
  
invented,	
  solves	
  non-­‐linear	
  
problems	
  
•  Shif	
  toward	
  separaFon	
  of	
  feature	
  
representaFon	
  and	
  classificaFon	
  
–  Handcraf	
  the	
  best	
  features,	
  train	
  
the	
  SVM	
  (or	
  current	
  state-­‐of-­‐the-­‐
art)	
  to	
  do	
  the	
  classificaFon	
  
•  Eventually,	
  mulF-­‐layer	
  perceptron	
  
generalizaFon	
  realized,	
  solves	
  non-­‐linear	
  
problems	
  
–  Nobody	
  cares…	
  
A	
  brief	
  history	
  of	
  neural	
  networks:	
  
Next	
  ~30	
  years	
  	
  
h"ps://www.youtube.com/watch?v=3liCbRZPrZA	
  
Handcrafed	
  arFsanal	
  features	
  
•  Discovering	
  good	
  features	
  is	
  hard!	
  
–  Requires	
  a	
  lot	
  of	
  domain	
  knowledge	
  
–  State	
  of	
  the	
  art	
  in	
  computer	
  vision	
  was	
  the	
  culminaFon	
  of	
  years	
  of	
  
collaboraFon	
  between	
  computer	
  vision	
  scienFsts,	
  neuroscienFsts,	
  etc.	
  
•  Neural	
  networks	
  automaFcally	
  learn	
  features	
  (weights)	
  from	
  examples	
  
based	
  on	
  the	
  task	
  
–  Each	
  neuron	
  is	
  a	
  “feature	
  detector”	
  that	
  acFvates	
  proporFonately	
  to	
  how	
  
well	
  its	
  input	
  matches	
  its	
  weights	
  
–  Deep	
  learning:	
  Shif	
  back	
  from	
  hand-­‐crafed	
  features	
  to	
  features	
  learned	
  
from	
  task	
  
General	
  learning	
  methods	
  for	
  robust	
  feature	
  
representaFon	
  and	
  classificaFon	
  
Hidden	
  1	
   Hidden	
  2	
   Hidden	
  3	
  
•  Handful	
  of	
  researchers	
  sFll	
  toiling	
  away	
  on	
  neural	
  networks	
  with	
  li2le-­‐to-­‐no	
  
recogniFon	
  
–  2012:	
  one	
  grad	
  student	
  studying	
  how	
  to	
  implement	
  neural	
  networks	
  on	
  GPUs	
  submits	
  
first	
  “deep	
  learning”	
  architecture	
  to	
  image	
  recogniFon	
  challenge,	
  wins	
  by	
  a	
  landslide	
  
–  2013:	
  Almost	
  every	
  submission	
  the	
  is	
  a	
  deep	
  neural	
  network	
  executed	
  on	
  GPU	
  
(conFnuing	
  trend)	
  
A	
  brief	
  history	
  of	
  neural	
  networks:	
  
Deep	
  learning	
  bandwagon	
  
First	
  deep	
  neural	
  network	
  
•  8	
  layers	
  
•  650,000	
  “neurons”	
  (units)	
  
•  60,000,000	
  learned	
  parameters	
  
•  630,000,000	
  connecFons	
  
•  Uses	
  same	
  basic	
  algorithm	
  as	
  mulF-­‐layer	
  perceptron	
  to	
  learn	
  weights	
  
•  Finally	
  caught	
  on	
  because	
  
–  Can	
  do	
  it	
  “fast”	
  (~1	
  week	
  in	
  2012)	
  thanks	
  to	
  GPU-­‐based	
  computaFon	
  
–  Actually	
  works	
  and	
  with	
  less	
  overfikng	
  due	
  to	
  tricks	
  and	
  massive	
  amounts	
  of	
  data	
  
AlexNet	
  
AlexNet	
  	
  
96	
  11x11	
  pixel	
  filter	
  weights	
  learned	
  from	
  ImageNet	
  	
  
AlexNet	
  
Handcrafed	
  Textons	
  
Unseen	
  image	
  classificaFons	
  
Neural	
  Networks	
  in	
  2016	
  
•  Variety	
  of	
  libraries	
  that	
  specify	
  
inputs	
  as	
  tensor	
  minibatch	
  and	
  
automaFcally	
  compute	
  gradients	
  
–  Tensorflow	
  
–  Theano	
  (Keras/Lasagne)	
  
–  Torch	
  
•  Libraries	
  also	
  available	
  for	
  
common	
  Neural	
  Network	
  layer	
  
types	
  
–  ConvoluFonal,	
  acFvaFon,	
  pooling,	
  	
  
dropout,	
  RNN,	
  etc.	
  
•  Almost	
  too	
  easy	
  
–  Mind	
  the	
  danger	
  zone!	
  
Data	
  science	
  due	
  diligence	
  
“Neural	
  Networks	
  sound	
  awesome	
  and	
  will	
  solve	
  all	
  our	
  
problems!”	
  
	
  
•  Significant	
  investment	
  in	
  resources.	
  GPU	
  (TPU?)	
  cluster,	
  ramp-­‐up	
  
on	
  niche/rapidly-­‐evolving	
  tools	
  
•  Long	
  feedback	
  loop	
  for	
  architecture	
  improvement.	
  Typically	
  launch	
  
many	
  jobs	
  and	
  terminate	
  bad	
  models	
  (see	
  above)	
  
•  Need	
  a	
  lot	
  of	
  high-­‐dimensional	
  data	
  with	
  variability	
  (millions	
  of	
  
unique	
  observaFons	
  and/or	
  heavy	
  data	
  augmentaFon).	
  Delicate	
  
balance	
  of	
  increased	
  predicFve	
  power/overfikng	
  	
  
•  Hard	
  to	
  debug	
  when	
  not	
  working.	
  Millions	
  of	
  reasons	
  (literally)	
  a	
  
model	
  can	
  be	
  wrong,	
  few	
  ways	
  it	
  can	
  be	
  right.	
  “Black	
  magic”	
  
•  Deep	
  nonlinear	
  models	
  suffer	
  from	
  interpretability	
  issues.	
  Blackbox	
  
model	
  (although	
  acFve	
  research	
  here)	
  
Thanks	
  
Manuel	
  Ruder,	
  Alexey	
  Dosovitskiy,	
  Thomas	
  Brox	
  (2016).	
  ArFsFc	
  style	
  transfer	
  for	
  videos.	
  
h2p://arxiv.org/abs/1604.08610	
  
h2ps://www.youtube.com/watch?v=Khuj4ASldmU	
  
Resources	
  
“This	
  is	
  cool,	
  but	
  I	
  don’t	
  (want	
  to)	
  code”	
  
h2p://playground.tensorflow.org	
  
“I	
  am	
  comfortable	
  with	
  the	
  SciPy	
  stack	
  
and	
  want	
  to	
  understand	
  more”	
  
	
  A	
  Neural	
  Network	
  in	
  11	
  lines	
  of	
  Python	
  
h2p://iamtrask.github.io/2015/07/12/basic-­‐python-­‐network/	
  
“I	
  am	
  comfortable	
  with	
  ML	
  libraries	
  and	
  
want	
  to	
  build	
  a	
  model”	
  
	
  MNIST	
  
•  Keras	
  
h2ps://github.com/fchollet/keras/blob/master/examples/
mnist_cnn.py	
  
•  Tensorflow	
  
h2ps://www.tensorflow.org/versions/r0.8/tutorials/mnist/pros/
index.html	
  
Varia'onal	
  Autoencoders	
  (also	
  using	
  MNIST)	
  
•  Keras	
  
h2p://blog.keras.io/building-­‐autoencoders-­‐in-­‐keras.html	
  
•  Tensorflow	
  
h2ps://jmetzen.github.io/2015-­‐11-­‐27/vae.html	
  

Intro to Neural Networks

  • 1.
    Intro  to  Neural  Networks   Dean  Wya2e   Boulder  Data  Science   @drwya2e   June  9,  2016  
  • 2.
    Neural  Networks   • AI  summer  is  here!   •  In  the  last  year  NNs   have     –  ConFnued  SOA   advancements  in   image  and  speech   recogniFon   –  Beaten  a  human  player   in  Go   –  Provided  some   quanFficaFon  of  “art”    
  • 3.
  • 4.
    •  100,000,000,000  neurons   •  10,000  dendriFc  inputs  per   neuron   •  1  electrical  output   How  does  your  brain  work?  
  • 5.
    One  simple  abstracFon   Dendri'c   input   Synap'c   weights   Soma   Axonal  output  
  • 6.
    Digression  into  regression   •  Linear  regression   •  LogisFc  regression  
  • 7.
    How  to  learn  the  weights?   •  If  we  know  what  output  should  look  like,  can   compute  error  and  update  weights  to  minimize  it   –  OpFmizaFon  problem,  typically  use  gradient  descent   _   Correct  output     Output   Error  
  • 8.
    Gradient  descent   • Given  a  cost  funcFon   – MSE   – Cross-­‐entropy   – etc.   •  Can  take  step  in  opposite  direcFon  of  cost   gradient  by  compuFng  derivaFve  w.r.t.   weights   •  Scale  by  learning  rate  (Fny  step)  
  • 9.
    A  brief  history  of  neural  networks:   The  Perceptron   x1   x2   y   0   0   0   0   1   0   1   0   0   1   1   1   ~1960:  “The  perceptron”   Universal  funcFon  approximator   AND  
  • 10.
    A  brief  history  of  neural  networks:   The  Perceptron   ~1960:  “The  perceptron”   Universal  funcFon  approximator  
  • 11.
    x1   x2   y   0   0   0   0   1   1   1   0   1   1   1   0   …but  only  if  funcFon  is  linearly  separable   XOR   ?   A  brief  history  of  neural  networks:   The  Perceptron  
  • 12.
    •  Neural  network  research  halts     (AI  winter)   •  Meanwhile…   –  Support  Vector  Machine  (SVM)   invented,  solves  non-­‐linear   problems   •  Shif  toward  separaFon  of  feature   representaFon  and  classificaFon   –  Handcraf  the  best  features,  train   the  SVM  (or  current  state-­‐of-­‐the-­‐ art)  to  do  the  classificaFon   •  Eventually,  mulF-­‐layer  perceptron   generalizaFon  realized,  solves  non-­‐linear   problems   –  Nobody  cares…   A  brief  history  of  neural  networks:   Next  ~30  years     h"ps://www.youtube.com/watch?v=3liCbRZPrZA  
  • 13.
  • 14.
    •  Discovering  good  features  is  hard!   –  Requires  a  lot  of  domain  knowledge   –  State  of  the  art  in  computer  vision  was  the  culminaFon  of  years  of   collaboraFon  between  computer  vision  scienFsts,  neuroscienFsts,  etc.   •  Neural  networks  automaFcally  learn  features  (weights)  from  examples   based  on  the  task   –  Each  neuron  is  a  “feature  detector”  that  acFvates  proporFonately  to  how   well  its  input  matches  its  weights   –  Deep  learning:  Shif  back  from  hand-­‐crafed  features  to  features  learned   from  task   General  learning  methods  for  robust  feature   representaFon  and  classificaFon   Hidden  1   Hidden  2   Hidden  3  
  • 15.
    •  Handful  of  researchers  sFll  toiling  away  on  neural  networks  with  li2le-­‐to-­‐no   recogniFon   –  2012:  one  grad  student  studying  how  to  implement  neural  networks  on  GPUs  submits   first  “deep  learning”  architecture  to  image  recogniFon  challenge,  wins  by  a  landslide   –  2013:  Almost  every  submission  the  is  a  deep  neural  network  executed  on  GPU   (conFnuing  trend)   A  brief  history  of  neural  networks:   Deep  learning  bandwagon   First  deep  neural  network  
  • 16.
    •  8  layers   •  650,000  “neurons”  (units)   •  60,000,000  learned  parameters   •  630,000,000  connecFons   •  Uses  same  basic  algorithm  as  mulF-­‐layer  perceptron  to  learn  weights   •  Finally  caught  on  because   –  Can  do  it  “fast”  (~1  week  in  2012)  thanks  to  GPU-­‐based  computaFon   –  Actually  works  and  with  less  overfikng  due  to  tricks  and  massive  amounts  of  data   AlexNet  
  • 17.
    AlexNet     96  11x11  pixel  filter  weights  learned  from  ImageNet     AlexNet   Handcrafed  Textons   Unseen  image  classificaFons  
  • 18.
    Neural  Networks  in  2016   •  Variety  of  libraries  that  specify   inputs  as  tensor  minibatch  and   automaFcally  compute  gradients   –  Tensorflow   –  Theano  (Keras/Lasagne)   –  Torch   •  Libraries  also  available  for   common  Neural  Network  layer   types   –  ConvoluFonal,  acFvaFon,  pooling,     dropout,  RNN,  etc.   •  Almost  too  easy   –  Mind  the  danger  zone!  
  • 19.
    Data  science  due  diligence   “Neural  Networks  sound  awesome  and  will  solve  all  our   problems!”     •  Significant  investment  in  resources.  GPU  (TPU?)  cluster,  ramp-­‐up   on  niche/rapidly-­‐evolving  tools   •  Long  feedback  loop  for  architecture  improvement.  Typically  launch   many  jobs  and  terminate  bad  models  (see  above)   •  Need  a  lot  of  high-­‐dimensional  data  with  variability  (millions  of   unique  observaFons  and/or  heavy  data  augmentaFon).  Delicate   balance  of  increased  predicFve  power/overfikng     •  Hard  to  debug  when  not  working.  Millions  of  reasons  (literally)  a   model  can  be  wrong,  few  ways  it  can  be  right.  “Black  magic”   •  Deep  nonlinear  models  suffer  from  interpretability  issues.  Blackbox   model  (although  acFve  research  here)  
  • 21.
    Thanks   Manuel  Ruder,  Alexey  Dosovitskiy,  Thomas  Brox  (2016).  ArFsFc  style  transfer  for  videos.   h2p://arxiv.org/abs/1604.08610   h2ps://www.youtube.com/watch?v=Khuj4ASldmU  
  • 22.
  • 23.
    “This  is  cool,  but  I  don’t  (want  to)  code”   h2p://playground.tensorflow.org  
  • 24.
    “I  am  comfortable  with  the  SciPy  stack   and  want  to  understand  more”    A  Neural  Network  in  11  lines  of  Python   h2p://iamtrask.github.io/2015/07/12/basic-­‐python-­‐network/  
  • 25.
    “I  am  comfortable  with  ML  libraries  and   want  to  build  a  model”    MNIST   •  Keras   h2ps://github.com/fchollet/keras/blob/master/examples/ mnist_cnn.py   •  Tensorflow   h2ps://www.tensorflow.org/versions/r0.8/tutorials/mnist/pros/ index.html   Varia'onal  Autoencoders  (also  using  MNIST)   •  Keras   h2p://blog.keras.io/building-­‐autoencoders-­‐in-­‐keras.html   •  Tensorflow   h2ps://jmetzen.github.io/2015-­‐11-­‐27/vae.html