Scene	
  Classifica,on	
  using	
  
Convolu,onal	
  Neural	
  Networks	
  
Jayani	
  Withanawasam	
  
Outline	
  
•  Computer	
  vision	
  as	
  an	
  AI	
  problem	
  
•  Importance	
  of	
  scene	
  classifica,on	
  and	
  its	
  
challenges	
  	
  
•  Tradi,onal	
  machine	
  learning	
  vs.	
  deep	
  learning	
  
•  Convolu,onal	
  Neural	
  Networks	
  (CNNs)	
  
•  Using	
  Caffe	
  for	
  implemen,ng	
  CNNs	
  	
  
•  Important	
  resources	
  to	
  proceed	
  with…	
  	
  
2	
  
Is	
  this	
  exercise	
  familiar	
  to	
  you?	
  
Scene	
  understanding	
  is	
  a	
  primary	
  school	
  task!	
  	
  
3	
  
What	
  do	
  you	
  see?	
  
4	
  
Photo	
  credits:	
  Kaushalya	
  Madhawa	
  
What	
  computers	
  see?	
  
Source:	
  hSp://www.cs.washington.edu/research/me,p/about/digital.html	
  
5	
  
Why	
  should	
  we	
  understand	
  visual	
  
data?	
  	
  	
  	
  
•  Billions	
  of	
  views	
  are	
  generated	
  on	
  YouTube	
  on	
  
daily	
  basis	
  	
  	
  	
  
•  In	
  Facebook,	
  hundreds	
  of	
  millions	
  of	
  	
  photo	
  
uploads	
  per	
  day	
  
Can	
  humans	
  manually	
  process	
  such	
  large	
  volumes	
  
of	
  data	
  generated	
  at	
  this	
  rate	
  to	
  instantly	
  find	
  
useful	
  insights?	
  
	
  
	
  
6	
  
Computer	
  vision	
  as	
  an	
  AI	
  problem	
  
•  Intelligent	
  behavior	
  of	
  an	
  agent	
  requires	
  the	
  
ability	
  to	
  effec,vely	
  interact	
  and	
  manipulate	
  
their	
  environment	
  	
  
•  Detailed	
  understanding	
  of	
  the	
  external	
  
environment	
  is	
  achieved	
  using	
  visual	
  
percep,on	
  	
  
•  Computer	
  vision	
  provides	
  methods	
  to	
  analyze	
  
images	
  to	
  understand	
  objects	
  and	
  scenes	
  	
  
	
  
7	
  
Using	
  the	
  forest	
  to	
  see	
  the	
  trees!	
  
(Torralba	
  et	
  al.)	
  	
  
8	
  
Source:	
  Using	
  the	
  forest	
  to	
  see	
  the	
  trees:	
  exploi,ng	
  context	
  for	
  visual	
  object	
  recogni,on	
  and	
  
localiza,on,	
  Torralba	
  et	
  al.)	
  
Scene	
  classifica,on	
  in	
  computer	
  vision	
  	
  
•  Main	
  focused	
  areas	
  	
  in	
  computer	
  vision	
  	
  
– Computer	
  graphics	
  	
  
– Image	
  recogni,on	
  
•  Image	
  recogni,on	
  is	
  based	
  on	
  concepts	
  related	
  
to	
  ar,ficial	
  intelligence	
  and	
  cogni,ve	
  science	
  	
  	
  
•  Scene	
  classifica,on	
  goes	
  under	
  image	
  
recogni,on.	
  	
  
•  Scene	
  classifica,on	
  problem	
  differs	
  from	
  object	
  
recogni,on	
  problem	
  as	
  a	
  scene	
  (context)	
  is	
  
composed	
  of	
  mul,ple	
  objects	
  	
  	
  
9	
  
Scene	
  classifica,on	
  in	
  computer	
  vision	
  
(Con,nued.)	
  
10	
  
Source:	
  Srinivasa	
  Narasimhan’s	
  slide	
  
In	
  1966,	
  Marvin	
  Minsky	
  at	
  MIT	
  asked	
  his	
  
undergraduate	
  student	
  Gerald	
  Jay	
  Sussman	
  to	
  
spend	
  the	
  summer	
  linking	
  a	
  camera	
  to	
  a	
  
computer	
  and	
  gefng	
  the	
  computer	
  to	
  describe	
  
what	
  it	
  saw.	
  We	
  now	
  know	
  the	
  problem	
  is	
  
slightly	
  more	
  difficult	
  than	
  that	
  ;)	
  	
  
	
  
Szeliski	
  2009,	
  Computer	
  vision	
  	
  	
  	
  
11	
  
Challenges	
  of	
  scene	
  classifica,on	
  
12	
  
Source:	
  Learning	
  deep	
  features	
  for	
  scene	
  recogni,on	
  using	
  places	
  database,	
  Zhou	
  et	
  al	
  
Scene	
  classifica,on:	
  then	
  and	
  now	
  
Labeling	
  segmenta,ons	
  of	
  the	
  scene	
  	
  
(part	
  based	
  models)	
  	
  	
  
	
  
	
  
	
  
Analyzing	
  the	
  en,re	
  scene	
  as	
  a	
  whole	
  and	
  train	
  
using	
  the	
  available	
  large	
  volumes	
  of	
  data	
  	
  
13	
  
Deep	
  Learning	
  
•  Tradi,onal	
  machine	
  learning	
  algorithms,	
  	
  
–  Do	
  not	
  perform	
  well	
  in	
  high	
  dimensional	
  space	
  
–  Requires	
  expert	
  knowledge	
  to	
  hand	
  engineer	
  
features	
  
–  High	
  computa,onal	
  cost	
  
	
  
•  Deep	
  learning	
  algorithms,	
  
–  Specialized	
  form	
  of	
  ar,ficial	
  neural	
  network	
  	
  
–  Representa,onal	
  learning	
  for	
  high	
  dimensional	
  
data	
  
–  Use	
  of	
  GPUs	
  to	
  accelerate	
  learning	
  	
  
Inspired	
  by	
  nature…	
  
15	
  Source:	
  Hubel	
  and	
  Wiesel	
  experiment	
  
•  Local	
  recep,ve	
  fields	
  	
  
•  Simple	
  cells	
  
•  Complex	
  cells	
  	
  
	
  
Convolu,onal	
  Neural	
  Networks	
  (CNNs)	
  
•  Deep	
  learning	
  technique	
  to	
  recognize	
  spa,al	
  
paSerns	
  of	
  data	
  	
  
•  Hierarchical	
  organiza,on	
  of	
  different	
  abstrac,on	
  
levels	
  of	
  image	
  features	
  
•  Type	
  of	
  Ar,ficial	
  Neural	
  Network	
  (ANN)	
  	
  
	
  
Assump,on:	
  You	
  are	
  familiar	
  with	
  basic	
  Ar,ficial	
  Neural	
  
Networks	
  (ANN)	
  and	
  machine	
  learning	
  concepts	
  
16	
  
Historical	
  CNN	
  architectures	
  	
  
17	
  
Source:	
  Gradient-­‐based	
  learning	
  applied	
  to	
  document	
  recogni,on,	
  
LeCun	
  et	
  al,	
  1998	
  
	
  
Source:	
  Imagenet	
  classifica,on	
  with	
  deep	
  convolu,onal	
  neural	
  networks,	
  Krizhevsky	
  et	
  al,	
  2012	
  
CNN	
  architecture	
  
18	
  
•  Convolu8on	
  layers	
  
•  Sub-­‐sampling	
  (Pooling)	
  layers	
  	
  
•  Non-­‐linearity	
  layers	
  (Ac,va,on	
  func,on)	
  	
  
•  Fully	
  connected	
  (FC)	
  layer	
  (op,onal)	
  
Source:	
  hSps://adeshpande3.github.io/adeshpande3.github.io/A-­‐Beginner's-­‐Guide-­‐To-­‐Understanding-­‐Convolu,onal-­‐Neural-­‐Networks/	
  
	
  
Important	
  hyper	
  parameters	
  for	
  CNN	
  	
  
•  Number	
  of	
  filters	
  (kernals)	
  	
  
•  Stride	
  
•  Size	
  of	
  the	
  filter	
  
•  Amount	
  of	
  padding	
  	
  	
  
•  Other	
  (not	
  CNN	
  specific)	
  	
  
– Learning	
  rate	
  (and	
  its	
  decay)	
  	
  
– Batch	
  size	
  	
  
– Momentum	
  	
  
19	
  
Caffe	
  for	
  CNN	
  implementa,on	
  
•  Convolu,onal	
  Architecture	
  For	
  Feature	
  Extrac,on	
  	
  
•  Deep	
  learning	
  framework	
  by	
  Berkley	
  Vision	
  and	
  
Learning	
  center	
  hSp://caffe.berkeleyvision.org/	
  	
  	
  
•  Reference	
  models	
  in	
  Caffe	
  model	
  Zoo	
  
•  Input	
  	
  (E.g.,	
  lmdb)	
  
•  Net:	
  Layers	
  (data,	
  loss,	
  convolu,on)	
  E.g.,	
  
lenet_train.prototxt	
  
•  Solver	
  (learning	
  rate,	
  net,	
  model	
  snapshots,	
  
valida,on)	
  E.g.,	
  lenet_solver.prototxt	
  
20	
  
lenet_solver.prototxt	
  
	
  
21	
  
lenet_train.prototxt	
  (few	
  important	
  
layers)	
  
22	
  
Data	
  layer	
  
Pooling	
  layer	
  
Convolu,onal	
  layer	
  
MIT	
  Places	
  for	
  scene	
  recogni,on	
  	
  
•  MIT	
  Places	
  database	
  	
  	
  
•  Places2	
  Challenge	
  	
  
•  MIT	
  Scene	
  Recogni,on	
  Demo	
  
•  hSp://places.csail.mit.edu	
  	
  
	
  
	
  
23	
  
Important	
  resources	
  	
  
•  CS231n:	
  Convolu,onal	
  neural	
  networks	
  for	
  
visual	
  recogni,on,	
  Fei	
  Fei	
  Li,	
  Andrej	
  Karpathy,	
  
Jus,n	
  Johnson,	
  Stanford	
  university.	
  
hSp://cs231n.stanford.edu/	
  	
  
•  DeepLearninbook,	
  Ian	
  Goodfellow,	
  Yoshua	
  
Bengio,	
  Aaron	
  Courville.	
  
hSp://www.deeplearningbook.org/	
  	
  	
  
24	
  
We	
  are	
  not	
  there	
  yet…	
  
Source:	
  Concise	
  Computer	
  Vision	
  
25	
  
Contact	
  me	
  
•  Linkedin:	
  
hSps://www.linkedin.com/in/
jayaniwithanawasam	
  	
  
•  Email:	
  jayaniwithanawasam@gmail.com	
  
26	
  
Thank	
  you	
  
27	
  

Scene classification using Convolutional Neural Networks - Jayani Withanawasam

  • 1.
    Scene  Classifica,on  using   Convolu,onal  Neural  Networks   Jayani  Withanawasam  
  • 2.
    Outline   •  Computer  vision  as  an  AI  problem   •  Importance  of  scene  classifica,on  and  its   challenges     •  Tradi,onal  machine  learning  vs.  deep  learning   •  Convolu,onal  Neural  Networks  (CNNs)   •  Using  Caffe  for  implemen,ng  CNNs     •  Important  resources  to  proceed  with…     2  
  • 3.
    Is  this  exercise  familiar  to  you?   Scene  understanding  is  a  primary  school  task!     3  
  • 4.
    What  do  you  see?   4   Photo  credits:  Kaushalya  Madhawa  
  • 5.
    What  computers  see?   Source:  hSp://www.cs.washington.edu/research/me,p/about/digital.html   5  
  • 6.
    Why  should  we  understand  visual   data?         •  Billions  of  views  are  generated  on  YouTube  on   daily  basis         •  In  Facebook,  hundreds  of  millions  of    photo   uploads  per  day   Can  humans  manually  process  such  large  volumes   of  data  generated  at  this  rate  to  instantly  find   useful  insights?       6  
  • 7.
    Computer  vision  as  an  AI  problem   •  Intelligent  behavior  of  an  agent  requires  the   ability  to  effec,vely  interact  and  manipulate   their  environment     •  Detailed  understanding  of  the  external   environment  is  achieved  using  visual   percep,on     •  Computer  vision  provides  methods  to  analyze   images  to  understand  objects  and  scenes       7  
  • 8.
    Using  the  forest  to  see  the  trees!   (Torralba  et  al.)     8   Source:  Using  the  forest  to  see  the  trees:  exploi,ng  context  for  visual  object  recogni,on  and   localiza,on,  Torralba  et  al.)  
  • 9.
    Scene  classifica,on  in  computer  vision     •  Main  focused  areas    in  computer  vision     – Computer  graphics     – Image  recogni,on   •  Image  recogni,on  is  based  on  concepts  related   to  ar,ficial  intelligence  and  cogni,ve  science       •  Scene  classifica,on  goes  under  image   recogni,on.     •  Scene  classifica,on  problem  differs  from  object   recogni,on  problem  as  a  scene  (context)  is   composed  of  mul,ple  objects       9  
  • 10.
    Scene  classifica,on  in  computer  vision   (Con,nued.)   10   Source:  Srinivasa  Narasimhan’s  slide  
  • 11.
    In  1966,  Marvin  Minsky  at  MIT  asked  his   undergraduate  student  Gerald  Jay  Sussman  to   spend  the  summer  linking  a  camera  to  a   computer  and  gefng  the  computer  to  describe   what  it  saw.  We  now  know  the  problem  is   slightly  more  difficult  than  that  ;)       Szeliski  2009,  Computer  vision         11  
  • 12.
    Challenges  of  scene  classifica,on   12   Source:  Learning  deep  features  for  scene  recogni,on  using  places  database,  Zhou  et  al  
  • 13.
    Scene  classifica,on:  then  and  now   Labeling  segmenta,ons  of  the  scene     (part  based  models)             Analyzing  the  en,re  scene  as  a  whole  and  train   using  the  available  large  volumes  of  data     13  
  • 14.
    Deep  Learning   • Tradi,onal  machine  learning  algorithms,     –  Do  not  perform  well  in  high  dimensional  space   –  Requires  expert  knowledge  to  hand  engineer   features   –  High  computa,onal  cost     •  Deep  learning  algorithms,   –  Specialized  form  of  ar,ficial  neural  network     –  Representa,onal  learning  for  high  dimensional   data   –  Use  of  GPUs  to  accelerate  learning    
  • 15.
    Inspired  by  nature…   15  Source:  Hubel  and  Wiesel  experiment   •  Local  recep,ve  fields     •  Simple  cells   •  Complex  cells      
  • 16.
    Convolu,onal  Neural  Networks  (CNNs)   •  Deep  learning  technique  to  recognize  spa,al   paSerns  of  data     •  Hierarchical  organiza,on  of  different  abstrac,on   levels  of  image  features   •  Type  of  Ar,ficial  Neural  Network  (ANN)       Assump,on:  You  are  familiar  with  basic  Ar,ficial  Neural   Networks  (ANN)  and  machine  learning  concepts   16  
  • 17.
    Historical  CNN  architectures     17   Source:  Gradient-­‐based  learning  applied  to  document  recogni,on,   LeCun  et  al,  1998     Source:  Imagenet  classifica,on  with  deep  convolu,onal  neural  networks,  Krizhevsky  et  al,  2012  
  • 18.
    CNN  architecture   18   •  Convolu8on  layers   •  Sub-­‐sampling  (Pooling)  layers     •  Non-­‐linearity  layers  (Ac,va,on  func,on)     •  Fully  connected  (FC)  layer  (op,onal)   Source:  hSps://adeshpande3.github.io/adeshpande3.github.io/A-­‐Beginner's-­‐Guide-­‐To-­‐Understanding-­‐Convolu,onal-­‐Neural-­‐Networks/    
  • 19.
    Important  hyper  parameters  for  CNN     •  Number  of  filters  (kernals)     •  Stride   •  Size  of  the  filter   •  Amount  of  padding       •  Other  (not  CNN  specific)     – Learning  rate  (and  its  decay)     – Batch  size     – Momentum     19  
  • 20.
    Caffe  for  CNN  implementa,on   •  Convolu,onal  Architecture  For  Feature  Extrac,on     •  Deep  learning  framework  by  Berkley  Vision  and   Learning  center  hSp://caffe.berkeleyvision.org/       •  Reference  models  in  Caffe  model  Zoo   •  Input    (E.g.,  lmdb)   •  Net:  Layers  (data,  loss,  convolu,on)  E.g.,   lenet_train.prototxt   •  Solver  (learning  rate,  net,  model  snapshots,   valida,on)  E.g.,  lenet_solver.prototxt   20  
  • 21.
  • 22.
    lenet_train.prototxt  (few  important   layers)   22   Data  layer   Pooling  layer   Convolu,onal  layer  
  • 23.
    MIT  Places  for  scene  recogni,on     •  MIT  Places  database       •  Places2  Challenge     •  MIT  Scene  Recogni,on  Demo   •  hSp://places.csail.mit.edu         23  
  • 24.
    Important  resources     •  CS231n:  Convolu,onal  neural  networks  for   visual  recogni,on,  Fei  Fei  Li,  Andrej  Karpathy,   Jus,n  Johnson,  Stanford  university.   hSp://cs231n.stanford.edu/     •  DeepLearninbook,  Ian  Goodfellow,  Yoshua   Bengio,  Aaron  Courville.   hSp://www.deeplearningbook.org/       24  
  • 25.
    We  are  not  there  yet…   Source:  Concise  Computer  Vision   25  
  • 26.
    Contact  me   • Linkedin:   hSps://www.linkedin.com/in/ jayaniwithanawasam     •  Email:  jayaniwithanawasam@gmail.com   26  
  • 27.