SlideShare a Scribd company logo
[course	
  site]	
  
Verónica Vilaplana
veronica.vilaplana@upc.edu
Associate Professor
Universitat Politecnica de Catalunya
Technical University of Catalonia
Convolutional Neural
Networks
Day 5 Lecture 1
#DLUPC
Index	
  
•  Mo-va-on:	
  
•  Local	
  connec-vity	
  
•  Parameter	
  sharing	
  
•  Pooling	
  and	
  subsampling	
  
•  Layers	
  
•  Convolu-onal	
  
•  Pooling	
  
•  Fully	
  connected	
  
•  Ac-va-on	
  func-ons	
  
•  Batch	
  normaliza-on	
  
•  Upsampling	
  
•  Examples	
  
2	
  
Mo)va)on	
  
Local	
  connec-vity	
  
Parameter	
  sharing	
  
Pooling	
  and	
  subsampling	
  
	
  
3	
  
Neural	
  networks	
  for	
  visual	
  data	
  
•  Example:	
  Image	
  recogni)on	
  
•  Given	
  some	
  input	
  image,	
  iden-fy	
  which	
  object	
  it	
  contains	
  
	
  
	
  
4	
  
sun	
  flower	
  
Caltech101	
  dataset	
  
image	
  size	
  150x112	
  
neurons	
  connected	
  to	
  16800	
  inputs	
  
Neural	
  networks	
  for	
  visual	
  data	
  
	
  
•  We	
  can	
  design	
  neural	
  networks	
  that	
  are	
  specifically	
  adapted	
  for	
  such	
  problems	
  
•  must	
  deal	
  with	
  very	
  high-­‐dimensional	
  inputs	
  
•  150	
  x	
  112	
  pixels	
  =	
  16800	
  inputs,	
  or	
  3	
  x	
  16800	
  if	
  RGB	
  pixels	
  
•  can	
  exploit	
  the	
  2D	
  topology	
  of	
  pixels	
  (or	
  3D	
  for	
  video	
  data)	
  
•  can	
  build	
  in	
  invariance	
  to	
  certain	
  varia-ons	
  we	
  can	
  expect	
  
•  transla-ons,	
  illumina-on,	
  etc.	
  
•  Convolu)onal	
  networks	
  are	
  a	
  specialized	
  kind	
  of	
  neural	
  network	
  for	
  processing	
  data	
  
that	
  has	
  a	
  known,	
  grid-­‐like	
  topology.	
  They	
  leverage	
  these	
  ideas:	
  
•  local	
  connec-vity	
  
•  parameter	
  sharing	
  
•  pooling	
  /	
  subsampling	
  hidden	
  units	
  
5	
  S.	
  Credit:	
  H.	
  Larochelle	
  
Convolu)onal	
  neural	
  networks	
  
Local	
  connec)vity	
  
•  First	
  idea:	
  use	
  a	
  local	
  connec-vity	
  of	
  hidden	
  units	
  
•  each	
  hidden	
  unit	
  is	
  connected	
  only	
  to	
  a	
  	
  
	
  	
  	
  	
  	
  	
  subregion	
  (patch)	
  of	
  the	
  input	
  image:	
  recep)ve	
  field	
  
•  it	
  is	
  connected	
  to	
  all	
  channels	
  
•  1	
  if	
  greyscale	
  image	
  
•  3	
  (R,	
  G,	
  B)	
  for	
  color	
  image	
  
•  …	
  
•  Solves	
  the	
  following	
  problems:	
  
•  fully	
  connected	
  hidden	
  layer	
  would	
  have	
  
	
  	
  	
  	
  	
  	
  	
  an	
  unmanageable	
  number	
  of	
  parameters	
  
•  compu-ng	
  the	
  linear	
  ac-va-ons	
  of	
  the	
  	
  
	
  	
  	
  	
  	
  	
  	
  	
  hidden	
  units	
  would	
  be	
  very	
  expensive	
  
6	
  
S.	
  Credit:	
  H.	
  Larochelle	
  
Convolu)onal	
  neural	
  networks	
  
Parameter	
  sharing	
  
•  Second	
  idea:	
  share	
  matrix	
  of	
  parameters	
  across	
  certain	
  units	
  
•  units	
  organized	
  into	
  the	
  same	
  ‘‘feature	
  map’’	
  share	
  parameters	
  
•  hidden	
  units	
  within	
  a	
  feature	
  map	
  cover	
  different	
  posi-ons	
  in	
  the	
  image	
  
•  Solves	
  the	
  following	
  problems:	
  
•  reduces	
  even	
  more	
  the	
  number	
  of	
  parameters	
  
•  will	
  extract	
  the	
  same	
  features	
  at	
  every	
  posi-on	
  (features	
  are	
  ‘‘equivariant’’)	
  
7	
  
Wij	
  is	
  the	
  matrix	
  
connec-ng	
  the	
  
ith	
  input	
  channel	
  
with	
  the	
  jth	
  
feature	
  map	
  
S.	
  Credit:	
  H.	
  Larochelle	
  
Convolu)onal	
  neural	
  networks	
  
Parameter	
  sharing	
  
	
  
•  Each	
  feature	
  map	
  forms	
  a	
  2D	
  grid	
  of	
  features	
  
•  can	
  be	
  computed	
  with	
  a	
  discrete	
  convolu)on	
  of	
  a	
  kernel	
  matrix	
  kij	
  	
  
	
  which	
  is	
  the	
  hidden	
  weights	
  matrix	
  Wij	
  with	
  its	
  rows	
  and	
  columns	
  flipped	
  
8	
  
Convolu-ons	
  
Input	
  image	
  
Input	
   Feature	
  Map	
  
...	
  
yj
= f ( kij
∗ xi
)
i
∑
xi	
  ith	
  channel	
  of	
  input,	
  yj	
  hidden	
  layer	
  
S.	
  Credit:	
  H.	
  Larochelle	
  
Convolu)onal	
  neural	
  networks	
  
•  Convolu-on	
  as	
  feature	
  extrac-on	
  
	
  	
  	
  	
  (applying	
  a	
  filterbank)	
  
•  but	
  filters	
  are	
  learned	
  
9	
  
Input	
   Feature	
  Map	
  
.
.
.
S.	
  Credit:	
  H.	
  Larochelle	
  
Convolu)onal	
  neural	
  networks	
  
Pooling	
  and	
  subsampling	
  
•  Third	
  idea:	
  pool	
  hidden	
  units	
  in	
  same	
  neighborhood	
  
•  pooling	
  is	
  performed	
  in	
  non-­‐overlapping	
  neighborhoods	
  (subsampling)	
  
•  an	
  alterna-ve	
  to	
  ‘‘max’’	
  pooling	
  is	
  ‘‘average’’	
  pooling	
  
•  pooling	
  reduces	
  dimensionality	
  and	
  provides	
  invariance	
  to	
  small	
  local	
  changes	
  
10	
  
Max	
  
S.	
  Credit:	
  H.	
  Larochelle	
  
Convolu)onal	
  Neural	
  Networks	
  
•  Convolu)onal	
  neural	
  networks	
  alternate	
  between	
  convolu)onal	
  layers	
  (followed	
  by	
  
a	
  nonlinearity)	
  and	
  pooling	
  layers	
  (basic	
  architecture)	
  
•  For	
  recogni-on:	
  output	
  layer	
  is	
  a	
  regular,	
  fully	
  connected	
  layer	
  with	
  sohmax	
  non-­‐
linearity	
  
•  output	
  provides	
  an	
  es-mate	
  of	
  the	
  condi-onal	
  probability	
  of	
  each	
  class	
  
•  The	
  network	
  is	
  trained	
  by	
  stochas-c	
  gradient	
  descent	
  (&	
  variants)	
  
•  backpropaga-on	
  is	
  used	
  similarly	
  as	
  in	
  a	
  fully	
  connected	
  network	
   11	
  S.	
  Credit:	
  H.	
  Larochelle	
  
train	
  the	
  weights	
  of	
  filters	
  
Convolu)onal	
  Neural	
  Networks	
  
12	
  
‘Clasic’	
  architecture	
  
for	
  object	
  recogni-on	
  
(BoW):	
  
CNN=	
  learning	
  
hierarchical	
  
representa)ons	
  with	
  
increasing	
  levels	
  of	
  
abstrac)on	
  
	
  
End-­‐to-­‐end	
  training:	
  
joint	
  op-miza-on	
  of	
  
features	
  and	
  classifier	
  
Feature	
  visualiza-on	
  of	
  convolu-onal	
  net	
  trained	
  on	
  ImageNet	
  (Zeiler,	
  Fergus,	
  2013)	
  
Example:	
  LeNet-­‐5	
  	
  
13	
  
•  LeCun	
  et	
  al.,	
  1998	
  
MNIST	
  digit	
  classifica-on	
  problem	
  
handwrilen	
  digits	
  
60,000	
  training	
  examples	
  
10,000	
  test	
  samples	
  
10	
  classes	
  
28x28	
  grayscale	
  images	
  
Y.	
  LeCun,	
  L.	
  Bolou,	
  Y.	
  Bengio,	
  and	
  P.	
  Haffner,	
  Gradient-­‐based	
  learning	
  applied	
  to	
  document	
  recogni-on,	
  Proc.	
  IEEE	
  86(11):	
  2278–2324,	
  1998.	
  
Conv	
  filters	
  were	
  5x5,	
  applied	
  at	
  stride	
  1	
  
Sigmoid	
  or	
  tanh	
  nonlinearity	
  
Subsampling	
  (average	
  pooling)	
  layers	
  were	
  2x2	
  applied	
  at	
  stride	
  2	
  
Fully	
  connected	
  layers	
  at	
  the	
  end	
  
i.e.	
  architecture	
  is	
  [CONV-­‐POOL-­‐CONV-­‐POOL-­‐CONV-­‐FC]	
  
Layers	
  
14	
  
Convolu)onal	
  Neural	
  Networks	
  
15	
  
A	
  regular	
  3-­‐layer	
  Neural	
  Network A	
  ConvNet	
  with	
  3	
  layers	
  
•  In	
  ConvNets	
  inputs	
  are	
  ‘images’	
  (architecture	
  is	
  constrained)	
  
•  A	
  ConvNet	
  arranges	
  neurons	
  in	
  three	
  dimensions	
  (width,	
  height,	
  depth)	
  
•  Every	
  layer	
  transforms	
  3D	
  input	
  volume	
  to	
  a	
  3D	
  output	
  volume	
  of	
  neuron	
  
ac-va-ons	
  
S.	
  Credit:	
  Stanford	
  cs231_2017	
  	
  
Convolu)onal	
  layer	
  
•  Convolu-on	
  on	
  a	
  2D	
  grid	
  
16	
  Image	
  source	
  
Convolu)onal	
  layer	
  
17	
  
32x32x3	
  input	
  
5x5x3	
  filter	
  
Filters	
  always	
  extend	
  the	
  full	
  detph	
  
of	
  the	
  input	
  volume	
  
Convolve	
  the	
  filter	
  with	
  the	
  input	
  
i.e.	
  slide	
  over	
  the	
  input	
  spa-ally,	
  
compu-ng	
  dot	
  products	
  
depth	
  
width	
  
height	
  
S.	
  Credit:	
  Stanford	
  cs231_2017	
  	
  
•  Convolu-on	
  on	
  a	
  volume	
  
Convolu)onal	
  layer	
  
18	
  
32x32x3	
  input	
  
5x5x3	
  filter	
  w
Ac-va-on	
  map	
  or	
  
feature	
  map	
  
Convolve	
  (slide)	
  over	
  all	
  
spa-al	
  loca-ons	
  
1	
  number:	
  
the	
  result	
  of	
  taking	
  a	
  dot	
  product	
  between	
  the	
  filter	
  and	
  a	
  
small	
  5x5x3	
  chunk	
  of	
  the	
  input:	
  5x5x3=75-­‐dim	
  dot	
  product	
  
+	
  bias	
  	
  	
  	
  	
  	
  wtx+b
S.	
  Credit:	
  Stanford	
  cs231_2017	
  	
  
•  Convolu-on	
  on	
  a	
  volume	
  
Convolu)onal	
  layer	
  
19	
  
32x32x3	
  input	
  
5x5x3	
  filter	
  w
Ac-va-on	
  maps	
  
Convolve	
  (slide)	
  over	
  all	
  
spa-al	
  loca-ons	
  
Consider	
  a	
  second,	
  green	
  filter	
  
S.	
  Credit:	
  Stanford	
  cs231_2017	
  	
  
Convolu)onal	
  layer	
  
20	
  
Ac-va-on	
  maps	
  of	
  
feature	
  maps	
  
Convolu-on	
  Layer	
  
We	
  stack	
  the	
  maps	
  up	
  to	
  get	
  a	
  ‘new	
  volume’	
  of	
  size	
  28x28x6
If	
  we	
  have	
  6	
  5x5x3	
  filters,	
  we	
  get	
  6	
  separate	
  ac-va-on	
  maps
So	
  applying	
  a	
  filterbank	
  to	
  an	
  input	
  (3D	
  matrix)	
  yields	
  a	
  cube-­‐like	
  output,	
  a	
  
3D	
  matrix	
  in	
  which	
  each	
  slice	
  is	
  an	
  output	
  of	
  convolu-on	
  with	
  one	
  filter.	
  
S.	
  Credit:	
  
Stanford	
  
cs231_2017	
  	
  
Convolu)onal	
  layer	
  
21	
  
ConvNet	
  is	
  a	
  sequence	
  of	
  Convolu-onal	
  Layers,	
  interspersed	
  with	
  ac-va-on	
  func-ons	
  
and	
  pooling	
  layers	
  (and	
  a	
  small	
  number	
  of	
  fully	
  connected	
  layers)
We	
  add	
  more	
  layers	
  of	
  filters.	
  We	
  apply	
  filters	
  (convolu-ons)	
  to	
  the	
  output	
  volume	
  of	
  
the	
  previous	
  layer.	
  The	
  result	
  of	
  each	
  convolu-on	
  is	
  a	
  slice	
  in	
  the	
  new	
  volume.	
  
S.	
  Credit:	
  Stanford	
  cs231_2017	
  	
  
Example:	
  filters	
  and	
  ac)va)on	
  maps	
  
22	
  
Example	
  CNN	
  trained	
  for	
  image	
  recogni-on	
  on	
  
CIFAR	
  dataset	
  
	
  
The	
  network	
  learns	
  features	
  that	
  ac-vate	
  
when	
  they	
  see	
  some	
  specific	
  type	
  of	
  feature	
  at	
  
some	
  spa-al	
  posi-on	
  in	
  the	
  input.	
  Stacking	
  
ac-va-on	
  maps	
  for	
  all	
  filters	
  along	
  depth	
  
dimension	
  forms	
  the	
  full	
  output	
  volume	
  
hlp://cs.stanford.edu/people/karpathy/convnetjs/demo/cifar10.html	
  
Convolu)on	
  
23	
  
7x7	
  input	
  (spa-ally)	
  	
  
assume	
  3x3	
  filter:	
  5x5	
  output	
  
(stride	
  1)	
  
Hyperparameters:	
  filter	
  spa-al	
  extent	
  F,	
  stride	
  S,	
  padding	
  P	
  
Stride	
  is	
  the	
  number	
  of	
  pixels	
  by	
  which	
  we	
  slide	
  the	
  filter	
  matrix	
  over	
  the	
  input	
  matrix.	
  
Larger	
  stride	
  will	
  produce	
  smaller	
  feature	
  maps.	
  
	
  
	
  
7	
  
7	
  
7	
  
7	
  
7x7	
  input	
  (spa-ally)	
  	
  
assume	
  3x3	
  filter	
  	
  
applied	
  with	
  stride	
  2:	
  	
  3x3	
  output
S.	
  Credit:	
  Stanford	
  cs231_2017	
  	
  
Convolu)on	
  
24	
  
N	
  
N	
  
Output	
  size:	
  (N-­‐F)/	
  stride	
  +	
  1	
  
e.g.	
  N=7,	
  F=3	
  
stride	
  1:	
  (7-­‐3)/1+1	
  =	
  5	
  
stride	
  2:	
  (7-­‐3)/2+1	
  =	
  3	
  	
  
stride	
  3:	
  (7-­‐3)/3+1	
  =	
  2.33	
  	
  	
  	
  not	
  applied	
  
	
  
E.g.	
  N=7,	
  F=3	
  ,	
  stride	
  1,	
  pad	
  with	
  1	
  pixel	
  
border:	
  output	
  size	
  7x7	
  
	
  
In	
  general,	
  common	
  to	
  see	
  CONV	
  layers	
  
with	
  stride	
  1,	
  filters	
  FxF,	
  zero-­‐padding	
  with	
  
P=	
  (F-­‐1)/2:	
  preserve	
  size	
  spa)ally	
  
common:	
  	
  
zero-­‐padding	
  
in	
  the	
  border	
  
S.	
  Credit:	
  Stanford	
  cs231_2017	
  	
  
Hyperparameters:	
  filter	
  spa-al	
  extent	
  F,	
  stride	
  S,	
  padding	
  P	
  
1x1	
  convolu)ons	
  
25	
  
1x1	
  conv	
  with	
  32	
  filters	
  
1x1	
  convolu-on	
  layers:	
  used	
  to	
  reduce	
  dimensionality	
  (number	
  of	
  feature	
  maps)
each	
  filter	
  has	
  size	
  1x1x64	
  
and	
  performs	
  a	
  64-­‐dimensional	
  
dot	
  product	
  
S.	
  Credit:	
  Stanford	
  cs231_2017	
  	
  
Example:	
  size,	
  parameters	
  
26	
  
Input	
  volume:	
  32x32x3	
  
10  5x5	
  filters	
  with	
  stride	
  1,	
  pad	
  2	
  
S.	
  Credit:	
  Stanford	
  cs231_2017	
  	
  
Output	
  volume	
  size:	
  	
  
(	
  N	
  +	
  2P	
  –	
  F	
  )	
  /	
  S	
  +	
  1	
  
(32+2*2-­‐5)/1+1=	
  32	
  spa-ally	
  so	
  	
  32x32x10	
  
	
  
	
  
	
  
Number	
  of	
  parameters	
  in	
  this	
  layer:	
  
each	
  filter	
  has	
  5x5x3	
  +	
  1	
  =76	
  params	
  (	
  +1	
  for	
  bias)	
  
-­‐>	
  	
  76x10	
  =	
  760	
  parameters	
  
Summary:	
  conv	
  layer	
  
To	
  summarize,	
  the	
  Conv	
  Layer	
  
•  Accepts	
  a	
  volume	
  of	
  size	
  W1	
  x	
  H1	
  x	
  D1	
  
•  Requires	
  four	
  hyperparameters	
  
•  number	
  of	
  filters	
  K	
  
•  their	
  spa)al	
  extent	
  F	
  
•  the	
  stride	
  S	
  
•  the	
  amount	
  of	
  zero	
  padding	
  P	
  
•  Produces	
  a	
  volume	
  of	
  size	
  W2	
  x	
  H2	
  x	
  D2	
  
•  W2	
  =	
  (W1	
  –	
  F	
  +	
  2P)	
  /	
  S	
  +	
  1	
  
•  H2	
  =	
  (H1	
  –	
  F	
  +	
  2P)	
  /	
  S	
  +	
  1	
  
•  D2	
  =	
  K	
  
•  With	
  parameter	
  sharing,	
  it	
  introduces	
  F.F.D1	
  weights	
  per	
  filter,	
  for	
  a	
  total	
  of	
  (F.F.D1).K	
  
weights	
  and	
  K	
  biases	
  
•  In	
  the	
  output	
  volume,	
  the	
  dth	
  depth	
  slice	
  (of	
  size	
  W2xH2)	
  is	
  the	
  result	
  of	
  performing	
  a	
  valid	
  
convolu-on	
  of	
  the	
  dth	
  filter	
  over	
  the	
  input	
  volume	
  with	
  a	
  stride	
  of	
  S,	
  and	
  then	
  offset	
  by	
  dth	
  bias	
  
27	
  
Common	
  sevngs:	
  
K	
  =	
  powers	
  of	
  2	
  (32,64,128,256)	
  
F=3,	
  S=1,	
  P=1	
  
F=5,	
  S=1,	
  P=2	
  
F=5,	
  S=2,	
  P=?	
  (whatever	
  fits)	
  
F=1,	
  S=1,	
  P=0	
  
S.	
  Credit:	
  Stanford	
  cs231_2017	
  	
  
Ac)va)on	
  func)ons	
  
•  Desirable	
  proper)es:	
  mostly	
  smooth,	
  con-nuous,	
  differen-able,	
  fairly	
  linear	
  
28	
  
Sigmoid	
  
	
  
	
  
	
  
	
  
	
  
tanh	
  
	
  
	
  
	
  
	
  
ReLu	
  	
  	
  	
  	
  
Leaky	
  ReLu	
  
	
  
	
  
	
  
	
  
Maxout	
  
	
  
	
  
	
  
ELU	
  
tanh x =
ex
− e−x
ex
+ e−x
σ (x) =
1
1+ e−x
max(0.1x,x)
max(0,x)
max(w1
T
x + b1
,w2
T
x + b2
)
Ac)va)on	
  func)ons	
  
29	
  
•  Example	
  
Pooling	
  layer	
  
•  Makes	
  the	
  representa-ons	
  smaller	
  and	
  more	
  manageable	
  for	
  later	
  layers	
  
•  Useful	
  to	
  get	
  invariance	
  to	
  small	
  local	
  changes	
  
•  Operates	
  over	
  each	
  ac-va-on	
  map	
  independently	
  
30	
  S.	
  Credit:	
  Stanford	
  cs231_2017	
  	
  
Pooling	
  layer	
  
•  Max	
  pooling	
  
•  Other	
  pooling	
  func-ons:	
  average	
  pooling	
  or	
  L2-­‐norm	
  pooling	
  
31	
  
y	
  
x	
  
single	
  depth	
  slice	
  
max	
  pool	
  with	
  2x2	
  
filters	
  and	
  stride	
  2	
  
S.	
  Credit:	
  Stanford	
  cs231_2017	
  	
  
Pooling	
  layer	
  
•  Example	
  
32	
  
Summary:	
  pooling	
  layer	
  
To	
  summarize,	
  the	
  pooling	
  layer	
  
•  Accepts	
  a	
  volume	
  of	
  size	
  W1xH1xD1	
  
•  Requires	
  two	
  hyperparameters	
  
•  the	
  spa)al	
  extent	
  F	
  
•  the	
  stride	
  S	
  
•  Produces	
  a	
  volume	
  of	
  size	
  W2xH2xD2,	
  where	
  
•  W2	
  =	
  (W1-­‐F)	
  /	
  S	
  +	
  1	
  
•  H2	
  =	
  (H1-­‐F)	
  /	
  S	
  +	
  1	
  
•  D2	
  =	
  D1	
  
•  Introduces	
  zero	
  parameters	
  since	
  it	
  computes	
  a	
  fixed	
  func-on	
  of	
  the	
  input	
  
33	
  
Common	
  sevngs:	
  
F=2,	
  S=2	
  
F=3,	
  S=2	
  
S.	
  Credit:	
  Stanford	
  cs231_2017	
  	
  
Fully	
  connected	
  layer	
  
•  In	
  the	
  end	
  it	
  is	
  common	
  to	
  add	
  one	
  or	
  more	
  fully	
  (or	
  densely	
  )	
  connected	
  layers.	
  	
  
•  Every	
  neuron	
  in	
  the	
  previous	
  layer	
  is	
  connected	
  to	
  every	
  neuron	
  in	
  the	
  next	
  layer	
  (as	
  
in	
  regular	
  neural	
  networks).	
  Ac-va-on	
  is	
  computed	
  as	
  matrix	
  mul-plica-on	
  plus	
  bias	
  
•  At	
  the	
  output,	
  sohmax	
  ac-va-on	
  for	
  classifica-on	
  
34	
  
connec-ons	
  and	
  weights	
  
not	
  shown	
  here	
  
4	
  possible	
  
outputs	
  
Fully	
  connected	
  layers	
  and	
  Convolu)onal	
  layers	
  
	
  
•  A	
  convolu-onal	
  layer	
  can	
  be	
  implemented	
  as	
  a	
  fully	
  connected	
  layer	
  
•  The	
  weight	
  matrix	
  is	
  a	
  large	
  matrix	
  that	
  is	
  mostly	
  zero	
  except	
  for	
  certain	
  
blocks	
  (due	
  to	
  local	
  connec-vity)	
  
35	
  
I	
  input	
  image	
  4x4	
  	
  vectorized	
  to	
  16x1	
  
O	
  output	
  image	
  4x1	
  (later	
  reshaped	
  2x2)	
  
h	
  3x3	
  kernel;	
  	
  C	
  16x4	
  (weights)	
  
Fully	
  connected	
  layers	
  and	
  Convolu)onal	
  layers	
  
36	
  
Fully	
  connected	
  layers	
  and	
  Convolu)onal	
  layers	
  
•  Fully	
  connected	
  layers	
  can	
  also	
  be	
  viewed	
  as	
  convolu-ons	
  with	
  kernels	
  that	
  
cover	
  the	
  en-re	
  input	
  region	
  
•  Example:	
  	
  
	
  A	
  fully	
  convolu-onal	
  layer	
  with	
  K=4096	
  neurons,	
  input	
  volume	
  	
  32x32x512	
  
	
  can	
  be	
  expressed	
  as	
  a	
  convolu-onal	
  layer	
  with	
  F=32	
  (size	
  of	
  kernel),	
  P=0,	
  S=1	
  
	
  K=4096	
  (number	
  of	
  filters)	
  
	
  the	
  filter	
  size	
  is	
  exactly	
  the	
  size	
  of	
  the	
  input	
  volume	
  
	
  the	
  output	
  is	
  1x1x4096	
  
	
  
37	
  
Batch	
  normaliza)on	
  layer	
  
•  As	
  learning	
  progresses,	
  the	
  distribu-on	
  of	
  the	
  layer	
  inputs	
  changes	
  due	
  
to	
  parameter	
  updates	
  (	
  internal	
  covariate	
  shih)	
  
•  This	
  can	
  result	
  in	
  most	
  inputs	
  being	
  in	
  the	
  non-­‐linear	
  regime	
  of	
  	
  
	
  the	
  ac-va-on	
  func-on,	
  slowing	
  down	
  learning	
  
	
  
•  Bach	
  normaliza-on	
  is	
  a	
  technique	
  to	
  reduce	
  this	
  effect	
  
•  Explicitly	
  force	
  the	
  layer	
  ac-va-ons	
  to	
  have	
  zero	
  mean	
  and	
  unit	
  variance	
  
w.r.t	
  running	
  batch	
  es-mates	
  
	
  
	
  
•  Adds	
  a	
  learnable	
  scale	
  and	
  bias	
  term	
  to	
  allow	
  the	
  network	
  to	
  s-ll	
  use	
  the	
  
nonlinearity	
  
38	
  	
  Ioffe	
  and	
  Szegedy,	
  2015.	
  “Batch	
  normaliza-on:	
  accelera-ng	
  deep	
  network	
  training	
  by	
  reducing	
  internal	
  covariate	
  shih”	
  
FC	
  /	
  Conv	
  
Batch	
  norm	
  
ReLu	
  
FC	
  /	
  Conv	
  
Batch	
  norm	
  
ReLu	
  
ˆx(k )
=
x(k )
− E(x(k )
)
var(x(k )
)
y(k )
= γ (k )
ˆx(k )
+ β(k )
Upsampling	
  layers:	
  recovering	
  spa)al	
  shape	
  
•  Mo-va-on:	
  seman-c	
  segmenta-on.	
  Make	
  predic-ons	
  for	
  all	
  pixels	
  at	
  once	
  
39	
  
Problem:	
  convolu-ons	
  at	
  original	
  image	
  resolu-on	
  will	
  be	
  very	
  expensive	
  
S.	
  Credit:	
  Stanford	
  cs231_2017	
  	
  
Upsampling	
  layers:	
  recovering	
  spa)al	
  shape	
  
•  Mo-va-on:	
  seman-c	
  segmenta-on.	
  Make	
  predic-ons	
  for	
  all	
  pixels	
  at	
  once	
  
40	
  
Design	
  a	
  network	
  as	
  a	
  sequence	
  of	
  convolu-onal	
  layers	
  with	
  downsampling	
  and	
  upsampling	
  
	
  
Other	
  applica)ons:	
  super-­‐resolu)on,	
  flow	
  es)ma)on,	
  genera)ve	
  modeling	
  
S.	
  Credit:	
  Stanford	
  cs231_2017	
  	
  
Learnable	
  upsampling	
  
•  Recall:	
  3x3	
  convolu-on,	
  stride	
  1,	
  pad	
  1	
  
41	
  
S.	
  Credit:	
  Stanford	
  cs231_2017	
  	
  
Learnable	
  upsampling	
  
•  Recall:	
  3x3	
  convolu-on,	
  stride	
  2,	
  pad	
  1	
  
42	
  
Filter	
  moves	
  2	
  pixels	
  in	
  the	
  input	
  
for	
  every	
  one	
  pixel	
  in	
  the	
  output	
  
	
  
Stride	
  gives	
  ra-o	
  between	
  
movement	
  in	
  input	
  and	
  output	
  
S.	
  Credit:	
  Stanford	
  cs231_2017	
  	
  
Learnable	
  upsampling:	
  transposed	
  convolu)on	
  
•  3x3	
  transposed	
  convolu-on,	
  stride	
  2,	
  pad1	
  	
  
43	
  
Sum	
  where	
  
output	
  overlaps	
  
Filter	
  moves	
  2	
  pixels	
  in	
  the	
  output	
  
for	
  every	
  one	
  pixel	
  in	
  the	
  input	
  
	
  
Stride	
  gives	
  ra-o	
  between	
  
movement	
  in	
  input	
  and	
  output	
  
S.	
  Credit:	
  Stanford	
  cs231_2017	
  	
  
Learnable	
  upsampling:	
  transposed	
  convolu)on	
  
•  1D	
  example	
  
44	
  
More info:
Dumoulin et al, A guide to convolution arithmetic for deep learning, 2016
https://github.com/vdumoulin/conv_arithmetic
Shi, Is the deconvolution layer the same as a convolutional layer?, 2016
Other	
  names	
  
-­‐transposed	
  convolu-on	
  
-­‐backward	
  strided	
  convolu-on,	
  	
  
-­‐frac-onally	
  strided	
  convolu-on,	
  	
  
-­‐upconvolu-on,	
  
-­‐“deconvolu-on”	
  
S.	
  Credit:	
  Stanford	
  cs231_2017	
  	
  
Learnable	
  upsampling	
  
	
  	
  
It	
  is	
  always	
  possible	
  to	
  emulate	
  a	
  transposed	
  convolu-on	
  with	
  a	
  direct	
  convolu-on	
  (with	
  frac-onal	
  stride)	
  
45	
  
More info:
Dumoulin et al, A guide to convolution arithmetic for deep learning, 2016
https://github.com/vdumoulin/conv_arithmetic
Learnable	
  upsampling	
  
46	
  
1D Convolution with stride 2
x input size 8
y output size 5
filter size 4 1D Transposed
Convolution with stride 2
1D Subpixel convolution
with stride 1/2
Transposed	
  convolu-on	
  vs	
  frac-onal	
  strided	
  convolu-on:	
  	
  the	
  two	
  operators	
  can	
  achieve	
  the	
  same	
  result	
  
if	
  the	
  filters	
  are	
  learned.	
  
Shi, Is the deconvolution layer the same as a convolutional layer?, 2016
1D example:
Dilated	
  convolu)ons	
  
•  Systema-cally	
  aggregate	
  mul-scale	
  contextual	
  informa-on	
  without	
  losing	
  resolu-on	
  
Usual	
  convolu-on 	
   	
   	
   	
   	
   	
  	
  	
  	
  	
  	
  Dilated	
  convolu-on	
  
	
  
47	
  
Yu,	
  Koltun,	
  Mul--­‐scale	
  context	
  aggrega-on	
  by	
  dilated	
  convolu-ons,	
  2016	
  
(F ∗k)( p) = F( p − t)k(t)
t
∑ (F ∗l
k)( p) = F( p − lt)k(t)
t
∑
Fi+1
= Fi
∗2i ki
i = 0,1,2,...n − 2The	
  recep-ve	
  field	
  grows	
  exponen-ally	
  as	
  you	
  add	
  more	
  layers	
  
Number	
  of	
  parameters	
  increases	
  linearly	
  as	
  you	
  add	
  more	
  layers	
  
Some	
  architectures	
  
for	
  Visual	
  Recogni-on	
  
48	
  
ImageNet:	
  ILSVRC	
  
49	
  
Image	
  Classifica-on	
  
1000	
  object	
  classes	
  (categories)	
  
Images:	
  
-­‐  1.2	
  M	
  train	
  
-­‐  100k	
  test.	
  
Metric:	
  top	
  5	
  error	
  rate	
  (predict	
  5	
  classes)	
  
Large	
  Scale	
  Visual	
  Recogni-on	
  Challenge	
  
www.image-­‐net.org/challenges/LSVRC/	
  
AlexNet	
  (2012)	
  
	
  
	
  
•  Similar	
  framework	
  to	
  LeNet:	
  
•  8	
  parameter	
  layers	
  (5	
  convolu-onal,	
  3	
  fully	
  connected)	
  
•  Max	
  pooling,	
  ReLu	
  nonlineari-es	
  
•  650,000	
  units,	
  60	
  million	
  parameters)	
  
•  trained	
  on	
  two	
  GPUs	
  for	
  a	
  week	
  
•  Dropout	
  regulariza-on	
  
50	
  
A.	
  Krizhevsky,	
  I.	
  Sutskever,	
  and	
  G.	
  Hinton,	
  ImageNet	
  Classifica-on	
  with	
  Deep	
  Convolu-onal	
  Neural	
  Networks,	
  NIPS	
  2012	
  
AlexNet	
  (2012)	
  
51	
  
Full	
  AlexNet	
  architecture:	
  
8	
  layers	
  
[227x227x3]	
  INPUT	
  
[55x55x96]	
  CONV1:	
  96	
  11x11	
  filters	
  at	
  stride	
  4,	
  pad	
  0	
  	
  
[27x27x96]	
  MAX	
  POOL1:	
  3x3	
  filters	
  at	
  stride	
  2	
  
[27x27x96]	
  NORM1:	
  Normaliza-on	
  layer	
  
[27x27x256]	
  CONV2:	
  256	
  5x5	
  filters	
  at	
  stride	
  1,	
  pad	
  2	
  	
  
[13x13x256]	
  MAX	
  POOL2:	
  3x3	
  filters	
  at	
  stride	
  2	
  
[13x13x256]	
  NORM2:	
  Normaliza-on	
  layer	
  
[13x13x384]	
  CONV3:	
  384	
  3x3	
  filters	
  at	
  stride	
  1,	
  pad	
  1	
  	
  
[13x13x384]	
  CONV4:	
  384	
  3x3	
  filters	
  at	
  stride	
  1,	
  pad	
  1	
  	
  
[13x13x256]	
  CONV5:	
  256	
  3x3	
  filters	
  at	
  stride	
  1,	
  pad	
  1	
  	
  
[6x6x256]	
  MAX	
  POOL3:	
  3x3	
  filters	
  at	
  stride	
  2	
  
[4096]	
  FC6:	
  4096	
  neurons	
  
[4096]	
  FC7:	
  4096	
  neurons	
  
[1000]	
  FC8:	
  1000	
  neurons	
  (class	
  scores)	
  
Details/Retrospec)ves:	
  	
  
-­‐	
  first	
  use	
  of	
  ReLU	
  
-­‐	
  used	
  Norm	
  layers	
  (not	
  common)	
  
-­‐	
  heavy	
  data	
  augmenta-on	
  
-­‐	
  dropout	
  0.5	
  
-­‐	
  batch	
  size	
  128	
  
-­‐	
  SGD	
  Momentum	
  0.9	
  
-­‐	
  Learning	
  rate	
  1e-­‐2,	
  reduced	
  by	
  10	
  manually	
  
when	
  val	
  accuracy	
  plateaus	
  
-­‐	
  L2	
  weight	
  decay	
  5e-­‐4	
  
ILSVRC	
  2012	
  winner	
  
7	
  CNN	
  ensemble:	
  18.2%	
  -­‐>	
  15.4%	
  
AlexNet	
  (2012)	
  
•  Visualiza-on	
  of	
  the	
  96	
  11x11	
  filters	
  learned	
  by	
  the	
  first	
  layer	
  
52	
  
A.	
  Krizhevsky,	
  I.	
  Sutskever,	
  and	
  G.	
  Hinton,	
  ImageNet	
  Classifica-on	
  with	
  Deep	
  Convolu-onal	
  Neural	
  Networks,	
  NIPS	
  2012	
  
VGGNet-­‐16	
  (2014)	
  
53	
  K.	
  Simonyan	
  and	
  A.	
  Zisserman,	
  Very	
  Deep	
  Convolu-onal	
  Networks	
  for	
  Large-­‐Scale	
  Image	
  Recogni-on,	
  ICLR	
  2015	
  
Seq.	
  of	
  deeper	
  nets	
  trained	
  progressively	
  
Large	
  recep-ve	
  fields	
  replaced	
  by	
  3x3	
  conv	
  
	
  
Only:	
  
3x3	
  CONV	
  stride	
  1,	
  pad	
  1	
  and	
  
2x2	
  MAX	
  POOL	
  stride	
  2	
  
	
  
Shows	
  that	
  depth	
  is	
  a	
  cri-cal	
  component	
  	
  
for	
  good	
  performance	
  
	
  
TOTAL	
  memory:	
  24M	
  *	
  4	
  bytes	
  ~=	
  	
  
93MB	
  /	
  image	
  	
  (only	
  forward!	
  ~*2	
  for	
  bwd)	
  
most	
  memory	
  is	
  in	
  the	
  early	
  CONV	
  
	
  
TOTAL	
  params:	
  138M	
  parameters	
  
most	
  parameters	
  are	
  in	
  late	
  FC	
  
GoogLeNet	
  (2014)	
  
54	
  
-­‐	
  Incep-on	
  module	
  that	
  drama-cally	
  reduced	
  
the	
  number	
  of	
  parameters	
  
-­‐	
  Uses	
  average	
  pooling	
  instead	
  of	
  FC	
  
	
  
Compared	
  to	
  AlexNet	
  
12x	
  less	
  parameters	
  (5M	
  vs	
  60M)	
  
2x	
  more	
  compute	
  6.67%	
  (vs	
  16,4%)	
  
22	
  layers	
  
Convolu-on	
  
Pooling	
  
Sohmax	
  
Other	
  
	
  C.	
  Szegedy	
  et	
  al.,	
  Going	
  deeper	
  with	
  convolu-ons,	
  CVPR	
  2015	
  
9	
  incep-on	
  modules:	
  Network	
  in	
  a	
  network…	
  
GoogLeNet	
  (2014)	
  
•  The	
  Incep-on	
  Module	
  
•  Parallel	
  paths	
  with	
  different	
  recep-ve	
  field	
  sizes	
  and	
  opera-ons	
  are	
  meant	
  to	
  capture	
  
sparse	
  palerns	
  of	
  correla-ons	
  in	
  the	
  stack	
  of	
  feature	
  maps	
  
•  Use	
  1x1	
  convolu-ons	
  for	
  dimensionality	
  reduc-on	
  before	
  expensive	
  convolu-ons	
  
55	
  
	
  C.	
  Szegedy	
  et	
  al.,	
  Going	
  deeper	
  with	
  convolu-ons,	
  CVPR	
  2015	
  
GoogLeNet	
  (2014)	
  
56	
  
Auxiliary	
  classifier	
  
	
  C.	
  Szegedy	
  et	
  al.,	
  Going	
  deeper	
  with	
  convolu-ons,	
  CVPR	
  2015	
  
•  Two	
  Sohmax	
  Classifiers	
  at	
  intermediate	
  layers	
  combat	
  the	
  vanishing	
  gradient	
  while	
  
providing	
  regulariza-on	
  at	
  training	
  -me.	
  
...and	
  no	
  fully	
  connected	
  layers	
  needed	
  !	
  
Convolu-on	
  
Pooling	
  
Sohmax	
  
Other	
  
Incep)on	
  v2,	
  v3,	
  v4	
  
•  Regularize	
  training	
  with	
  batch	
  normaliza-on,	
  reducing	
  importance	
  of	
  auxiliary	
  
classifiers	
  
•  More	
  variants	
  of	
  incep-on	
  modules	
  with	
  aggressive	
  factoriza-on	
  of	
  filters	
  
57	
  
	
  C.	
  Szegedy	
  et	
  al.,	
  Rethinking	
  the	
  incep-on	
  architecture	
  for	
  computer	
  vision,	
  CVPR	
  2016	
  
	
  C.	
  Szegedy	
  et	
  al.,	
  Incep-on-­‐v4,	
  Incep-on-­‐ResNet	
  and	
  the	
  Impact	
  of	
  Residual	
  Connec-ons	
  on	
  Learning,	
  arXiv	
  2016	
  
ResNet	
  (2015)	
  
58	
  
ILSVRC	
  2015	
  winner	
  (3.6%	
  top	
  5	
  error)	
  
	
  
MSRA:	
  ILSVRC	
  &	
  COCO	
  2015	
  compe--ons	
  
-­‐  ImageNet	
  Classifica-on:	
  “ultra	
  deep”,	
  152-­‐leyers	
  
-­‐  ImageNet	
  Detec-on:	
  16%	
  beler	
  than	
  2nd	
  
-­‐  ImageNet	
  Localiza-on:	
  27%	
  beler	
  than	
  2nd	
  
-­‐  COCO	
  Detec-on:	
  11%	
  beler	
  than	
  2nd	
  
-­‐  COCO	
  Segmenta-on:	
  12%	
  beler	
  than	
  2nd	
  
2-­‐3	
  weeks	
  of	
  training	
  on	
  8	
  GPU	
  machine	
  
at	
  run-me:	
  faster	
  than	
  a	
  VGGNet!	
  
(even	
  though	
  it	
  has	
  8x	
  more	
  layers)	
  
spa-al	
  
dimension	
  
only	
  56x56!	
  
Kaiming	
  He,	
  Xiangyu	
  Zhang,	
  Shaoqing	
  Ren,	
  and	
  Jian	
  Sun,	
  Deep	
  Residual	
  Learning	
  for	
  Image	
  Recogni-on,	
  
CVPR	
  2016	
  (Best	
  Paper)	
  
ResNet	
  (2015)	
  
59	
  
Plain	
  net	
  
-­‐	
  Batch	
  Normaliza-on	
  aher	
  every	
  CONV	
  layer	
  
-­‐	
  Xavier/2	
  ini-aliza-on	
  from	
  He	
  et	
  al.	
  
-­‐	
  SGD	
  +	
  Momentum	
  (0.9)	
  
-­‐	
  Learning	
  rate:	
  0.1,	
  divided	
  by	
  10	
  when	
  valida-on	
  
error	
  plateaus	
  
-­‐	
  Mini-­‐batch	
  size	
  256	
  
-­‐	
  Weight	
  decay	
  of	
  1e-­‐5	
  
-­‐	
  No	
  dropout	
  used	
  
Residual	
  net	
  
Residual	
  learning:	
  reformulate	
  the	
  layers	
  as	
  
learning	
  residual	
  func)ons	
  with	
  reference	
  to	
  the	
  
layer	
  inputs,	
  instead	
  of	
  learning	
  unreferenced	
  
func-ons	
  
	
  
-­‐	
  Introduces	
  skip	
  or	
  shortcut	
  connec-ons	
  (exis-ng	
  
before	
  in	
  the	
  literature)	
  
-­‐	
  Make	
  it	
  easy	
  for	
  net	
  layers	
  to	
  represent	
  the	
  
iden-ty	
  mapping	
  	
  
Kaiming	
  He,	
  Xiangyu	
  Zhang,	
  Shaoqing	
  Ren,	
  and	
  Jian	
  Sun,	
  Deep	
  Residual	
  Learning	
  for	
  Image	
  Recogni-on,	
  
CVPR	
  2016	
  (Best	
  Paper)	
  
ResNet	
  (2015)	
  
•  Deeper	
  residual	
  module	
  (bolleneck)	
  
60	
  
¤  Directly	
  performing	
  3x3	
  convolu-ons	
  with	
  
256	
  feature	
  maps	
  at	
  input	
  and	
  output:	
  	
  
256	
  x	
  256	
  x	
  3	
  x	
  3	
  ~	
  600K	
  opera-ons	
  
¤  Using	
  1x1	
  convolu-ons	
  to	
  reduce	
  256	
  to	
  64	
  
feature	
  maps,	
  followed	
  by	
  3x3	
  
convolu-ons,	
  followed	
  by	
  1x1	
  convolu-ons	
  
to	
  expand	
  back	
  to	
  256	
  maps:	
  
256	
  x	
  64	
  x	
  1	
  x	
  1	
  ~	
  16K	
  
64	
  x	
  64	
  x	
  3	
  x	
  3	
  ~	
  36K	
  
64	
  x	
  256	
  x	
  1	
  x	
  1	
  ~	
  16K	
  
Total:	
  ~70K	
  
Kaiming	
  He,	
  Xiangyu	
  Zhang,	
  Shaoqing	
  Ren,	
  and	
  Jian	
  Sun,	
  Deep	
  Residual	
  Learning	
  for	
  Image	
  Recogni-on,	
  
CVPR	
  2016	
  (Best	
  Paper)	
  
Deeper	
  models	
  
61	
  Kaiming	
  He,	
  Xiangyu	
  Zhang,	
  Shaoqing	
  Ren,	
  and	
  Jian	
  Sun,	
  Deep	
  Residual	
  Learning	
  for	
  Image	
  Recogni-on,	
  CVPR	
  2016	
  
ILSVRC	
  2012-­‐2015	
  summary	
  
62	
  
Team	
   Year	
   Place	
   Error	
  (top-­‐5)	
   External	
  data	
  
SuperVision	
  –	
  Toronto	
  
(AlexNet,	
  7	
  layers)	
  
2012	
  	
   -­‐	
   16.4%	
   no	
  
SuperVision	
   2012	
  	
   1st	
   15.3%	
   ImageNet	
  22k	
  
Clarifai	
  –	
  NYU	
  (7	
  layers)	
   2013	
   -­‐	
   11.7%	
   no	
  
Clarifai	
   2013	
   1st	
   11.2%	
   ImageNet	
  22k	
  
VGG	
  –	
  Oxford	
  (16	
  layers)	
   2014	
   2nd	
   7.32%	
   no	
  
GoogLeNet	
  (19	
  layers)	
   2014	
   1st	
   6.67%	
   no	
  
ResNet	
  (152	
  layers)	
   2015	
   1st	
   3.57%	
  
Human	
  expert*	
   5.1%	
  
hlp://karpathy.github.io/2014/09/02/what-­‐i-­‐learned-­‐from-­‐compe-ng-­‐against-­‐a-­‐convnet-­‐on-­‐imagenet/	
  
Summary	
  
•  Convolu-onal	
  neural	
  networks	
  are	
  a	
  specialized	
  kind	
  of	
  neural	
  network	
  for	
  processing	
  
data	
  that	
  has	
  a	
  known,	
  grid-­‐like	
  topology	
  
•  CNNs	
  leverage	
  these	
  ideas:	
  
•  local	
  connec-vity	
  
•  parameter	
  sharing	
  
•  pooling	
  /	
  subsampling	
  hidden	
  units	
  
•  Layers:	
  convolu-onal,	
  non-­‐linear	
  ac-va-on,	
  pooling,	
  upsampling,	
  batch	
  normaliza-on	
  
•  Architectures	
  for	
  object	
  recogni-on	
  in	
  images	
  
63	
  

More Related Content

What's hot

Multilayer Perceptron - Elisa Sayrol - UPC Barcelona 2018
Multilayer Perceptron - Elisa Sayrol - UPC Barcelona 2018Multilayer Perceptron - Elisa Sayrol - UPC Barcelona 2018
Multilayer Perceptron - Elisa Sayrol - UPC Barcelona 2018
Universitat Politècnica de Catalunya
 
Transfer Learning and Domain Adaptation (DLAI D5L2 2017 UPC Deep Learning for...
Transfer Learning and Domain Adaptation (DLAI D5L2 2017 UPC Deep Learning for...Transfer Learning and Domain Adaptation (DLAI D5L2 2017 UPC Deep Learning for...
Transfer Learning and Domain Adaptation (DLAI D5L2 2017 UPC Deep Learning for...
Universitat Politècnica de Catalunya
 
Backpropagation (DLAI D3L1 2017 UPC Deep Learning for Artificial Intelligence)
Backpropagation (DLAI D3L1 2017 UPC Deep Learning for Artificial Intelligence)Backpropagation (DLAI D3L1 2017 UPC Deep Learning for Artificial Intelligence)
Backpropagation (DLAI D3L1 2017 UPC Deep Learning for Artificial Intelligence)
Universitat Politècnica de Catalunya
 
The Perceptron - Xavier Giro-i-Nieto - UPC Barcelona 2018
The Perceptron - Xavier Giro-i-Nieto - UPC Barcelona 2018The Perceptron - Xavier Giro-i-Nieto - UPC Barcelona 2018
The Perceptron - Xavier Giro-i-Nieto - UPC Barcelona 2018
Universitat Politècnica de Catalunya
 
Variational Autoencoders VAE - Santiago Pascual - UPC Barcelona 2018
Variational Autoencoders VAE - Santiago Pascual - UPC Barcelona 2018Variational Autoencoders VAE - Santiago Pascual - UPC Barcelona 2018
Variational Autoencoders VAE - Santiago Pascual - UPC Barcelona 2018
Universitat Politècnica de Catalunya
 
Skip RNN: Learning to Skip State Updates in Recurrent Neural Networks
Skip RNN: Learning to Skip State Updates in Recurrent Neural NetworksSkip RNN: Learning to Skip State Updates in Recurrent Neural Networks
Skip RNN: Learning to Skip State Updates in Recurrent Neural Networks
Universitat Politècnica de Catalunya
 
The Perceptron (D1L2 Deep Learning for Speech and Language)
The Perceptron (D1L2 Deep Learning for Speech and Language)The Perceptron (D1L2 Deep Learning for Speech and Language)
The Perceptron (D1L2 Deep Learning for Speech and Language)
Universitat Politècnica de Catalunya
 
Convolutional Neural Networks (D1L3 2017 UPC Deep Learning for Computer Vision)
Convolutional Neural Networks (D1L3 2017 UPC Deep Learning for Computer Vision)Convolutional Neural Networks (D1L3 2017 UPC Deep Learning for Computer Vision)
Convolutional Neural Networks (D1L3 2017 UPC Deep Learning for Computer Vision)
Universitat Politècnica de Catalunya
 
Attention Models (D3L6 2017 UPC Deep Learning for Computer Vision)
Attention Models (D3L6 2017 UPC Deep Learning for Computer Vision)Attention Models (D3L6 2017 UPC Deep Learning for Computer Vision)
Attention Models (D3L6 2017 UPC Deep Learning for Computer Vision)
Universitat Politècnica de Catalunya
 
Loss Functions for Deep Learning - Javier Ruiz Hidalgo - UPC Barcelona 2018
Loss Functions for Deep Learning - Javier Ruiz Hidalgo - UPC Barcelona 2018Loss Functions for Deep Learning - Javier Ruiz Hidalgo - UPC Barcelona 2018
Loss Functions for Deep Learning - Javier Ruiz Hidalgo - UPC Barcelona 2018
Universitat Politècnica de Catalunya
 
Perceptrons (D1L2 2017 UPC Deep Learning for Computer Vision)
Perceptrons (D1L2 2017 UPC Deep Learning for Computer Vision)Perceptrons (D1L2 2017 UPC Deep Learning for Computer Vision)
Perceptrons (D1L2 2017 UPC Deep Learning for Computer Vision)
Universitat Politècnica de Catalunya
 
Deep Learning for Computer Vision: Attention Models (UPC 2016)
Deep Learning for Computer Vision: Attention Models (UPC 2016)Deep Learning for Computer Vision: Attention Models (UPC 2016)
Deep Learning for Computer Vision: Attention Models (UPC 2016)
Universitat Politècnica de Catalunya
 
Deep Learning for Computer Vision: Deep Networks (UPC 2016)
Deep Learning for Computer Vision: Deep Networks (UPC 2016)Deep Learning for Computer Vision: Deep Networks (UPC 2016)
Deep Learning for Computer Vision: Deep Networks (UPC 2016)
Universitat Politècnica de Catalunya
 
Recurrent Neural Networks (DLAI D7L1 2017 UPC Deep Learning for Artificial In...
Recurrent Neural Networks (DLAI D7L1 2017 UPC Deep Learning for Artificial In...Recurrent Neural Networks (DLAI D7L1 2017 UPC Deep Learning for Artificial In...
Recurrent Neural Networks (DLAI D7L1 2017 UPC Deep Learning for Artificial In...
Universitat Politècnica de Catalunya
 
Scala and Deep Learning
Scala and Deep LearningScala and Deep Learning
Scala and Deep Learning
Oswald Campesato
 
C++ and Deep Learning
C++ and Deep LearningC++ and Deep Learning
C++ and Deep Learning
Oswald Campesato
 
Anomaly detection using deep one class classifier
Anomaly detection using deep one class classifierAnomaly detection using deep one class classifier
Anomaly detection using deep one class classifier
홍배 김
 
Anomaly Detection and Localization Using GAN and One-Class Classifier
Anomaly Detection and Localization  Using GAN and One-Class ClassifierAnomaly Detection and Localization  Using GAN and One-Class Classifier
Anomaly Detection and Localization Using GAN and One-Class Classifier
홍배 김
 
The world of loss function
The world of loss functionThe world of loss function
The world of loss function
홍배 김
 
Deep Learning: R with Keras and TensorFlow
Deep Learning: R with Keras and TensorFlowDeep Learning: R with Keras and TensorFlow
Deep Learning: R with Keras and TensorFlow
Oswald Campesato
 

What's hot (20)

Multilayer Perceptron - Elisa Sayrol - UPC Barcelona 2018
Multilayer Perceptron - Elisa Sayrol - UPC Barcelona 2018Multilayer Perceptron - Elisa Sayrol - UPC Barcelona 2018
Multilayer Perceptron - Elisa Sayrol - UPC Barcelona 2018
 
Transfer Learning and Domain Adaptation (DLAI D5L2 2017 UPC Deep Learning for...
Transfer Learning and Domain Adaptation (DLAI D5L2 2017 UPC Deep Learning for...Transfer Learning and Domain Adaptation (DLAI D5L2 2017 UPC Deep Learning for...
Transfer Learning and Domain Adaptation (DLAI D5L2 2017 UPC Deep Learning for...
 
Backpropagation (DLAI D3L1 2017 UPC Deep Learning for Artificial Intelligence)
Backpropagation (DLAI D3L1 2017 UPC Deep Learning for Artificial Intelligence)Backpropagation (DLAI D3L1 2017 UPC Deep Learning for Artificial Intelligence)
Backpropagation (DLAI D3L1 2017 UPC Deep Learning for Artificial Intelligence)
 
The Perceptron - Xavier Giro-i-Nieto - UPC Barcelona 2018
The Perceptron - Xavier Giro-i-Nieto - UPC Barcelona 2018The Perceptron - Xavier Giro-i-Nieto - UPC Barcelona 2018
The Perceptron - Xavier Giro-i-Nieto - UPC Barcelona 2018
 
Variational Autoencoders VAE - Santiago Pascual - UPC Barcelona 2018
Variational Autoencoders VAE - Santiago Pascual - UPC Barcelona 2018Variational Autoencoders VAE - Santiago Pascual - UPC Barcelona 2018
Variational Autoencoders VAE - Santiago Pascual - UPC Barcelona 2018
 
Skip RNN: Learning to Skip State Updates in Recurrent Neural Networks
Skip RNN: Learning to Skip State Updates in Recurrent Neural NetworksSkip RNN: Learning to Skip State Updates in Recurrent Neural Networks
Skip RNN: Learning to Skip State Updates in Recurrent Neural Networks
 
The Perceptron (D1L2 Deep Learning for Speech and Language)
The Perceptron (D1L2 Deep Learning for Speech and Language)The Perceptron (D1L2 Deep Learning for Speech and Language)
The Perceptron (D1L2 Deep Learning for Speech and Language)
 
Convolutional Neural Networks (D1L3 2017 UPC Deep Learning for Computer Vision)
Convolutional Neural Networks (D1L3 2017 UPC Deep Learning for Computer Vision)Convolutional Neural Networks (D1L3 2017 UPC Deep Learning for Computer Vision)
Convolutional Neural Networks (D1L3 2017 UPC Deep Learning for Computer Vision)
 
Attention Models (D3L6 2017 UPC Deep Learning for Computer Vision)
Attention Models (D3L6 2017 UPC Deep Learning for Computer Vision)Attention Models (D3L6 2017 UPC Deep Learning for Computer Vision)
Attention Models (D3L6 2017 UPC Deep Learning for Computer Vision)
 
Loss Functions for Deep Learning - Javier Ruiz Hidalgo - UPC Barcelona 2018
Loss Functions for Deep Learning - Javier Ruiz Hidalgo - UPC Barcelona 2018Loss Functions for Deep Learning - Javier Ruiz Hidalgo - UPC Barcelona 2018
Loss Functions for Deep Learning - Javier Ruiz Hidalgo - UPC Barcelona 2018
 
Perceptrons (D1L2 2017 UPC Deep Learning for Computer Vision)
Perceptrons (D1L2 2017 UPC Deep Learning for Computer Vision)Perceptrons (D1L2 2017 UPC Deep Learning for Computer Vision)
Perceptrons (D1L2 2017 UPC Deep Learning for Computer Vision)
 
Deep Learning for Computer Vision: Attention Models (UPC 2016)
Deep Learning for Computer Vision: Attention Models (UPC 2016)Deep Learning for Computer Vision: Attention Models (UPC 2016)
Deep Learning for Computer Vision: Attention Models (UPC 2016)
 
Deep Learning for Computer Vision: Deep Networks (UPC 2016)
Deep Learning for Computer Vision: Deep Networks (UPC 2016)Deep Learning for Computer Vision: Deep Networks (UPC 2016)
Deep Learning for Computer Vision: Deep Networks (UPC 2016)
 
Recurrent Neural Networks (DLAI D7L1 2017 UPC Deep Learning for Artificial In...
Recurrent Neural Networks (DLAI D7L1 2017 UPC Deep Learning for Artificial In...Recurrent Neural Networks (DLAI D7L1 2017 UPC Deep Learning for Artificial In...
Recurrent Neural Networks (DLAI D7L1 2017 UPC Deep Learning for Artificial In...
 
Scala and Deep Learning
Scala and Deep LearningScala and Deep Learning
Scala and Deep Learning
 
C++ and Deep Learning
C++ and Deep LearningC++ and Deep Learning
C++ and Deep Learning
 
Anomaly detection using deep one class classifier
Anomaly detection using deep one class classifierAnomaly detection using deep one class classifier
Anomaly detection using deep one class classifier
 
Anomaly Detection and Localization Using GAN and One-Class Classifier
Anomaly Detection and Localization  Using GAN and One-Class ClassifierAnomaly Detection and Localization  Using GAN and One-Class Classifier
Anomaly Detection and Localization Using GAN and One-Class Classifier
 
The world of loss function
The world of loss functionThe world of loss function
The world of loss function
 
Deep Learning: R with Keras and TensorFlow
Deep Learning: R with Keras and TensorFlowDeep Learning: R with Keras and TensorFlow
Deep Learning: R with Keras and TensorFlow
 

Similar to Convolutional Neural Networks (DLAI D5L1 2017 UPC Deep Learning for Artificial Intelligence)

Deep Learning
Deep LearningDeep Learning
Deep Learning
Pierre de Lacaze
 
convolutional_neural_networks in deep learning
convolutional_neural_networks in deep learningconvolutional_neural_networks in deep learning
convolutional_neural_networks in deep learning
ssusere5ddd6
 
intro-to-cnn-April_2020.pptx
intro-to-cnn-April_2020.pptxintro-to-cnn-April_2020.pptx
intro-to-cnn-April_2020.pptx
ssuser3aa461
 
build a Convolutional Neural Network (CNN) using TensorFlow in Python
build a Convolutional Neural Network (CNN) using TensorFlow in Pythonbuild a Convolutional Neural Network (CNN) using TensorFlow in Python
build a Convolutional Neural Network (CNN) using TensorFlow in Python
Kv Sagar
 
Deep learning (2)
Deep learning (2)Deep learning (2)
Deep learning (2)
Muhanad Al-khalisy
 
Convolutional neural network
Convolutional neural networkConvolutional neural network
Convolutional neural network
Ferdous ahmed
 
Convolutional Neural Networks
Convolutional Neural NetworksConvolutional Neural Networks
Convolutional Neural Networks
milad abbasi
 
Cnn
CnnCnn
Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)
Gaurav Mittal
 
PixelCNN, Wavenet, Normalizing Flows - Santiago Pascual - UPC Barcelona 2018
PixelCNN, Wavenet, Normalizing Flows - Santiago Pascual - UPC Barcelona 2018PixelCNN, Wavenet, Normalizing Flows - Santiago Pascual - UPC Barcelona 2018
PixelCNN, Wavenet, Normalizing Flows - Santiago Pascual - UPC Barcelona 2018
Universitat Politècnica de Catalunya
 
Network Deconvolution review [cdm]
Network Deconvolution review [cdm]Network Deconvolution review [cdm]
Network Deconvolution review [cdm]
Dongmin Choi
 
lec6a.ppt
lec6a.pptlec6a.ppt
lec6a.ppt
SaadMemon23
 
Artificial Intelligence, Machine Learning and Deep Learning
Artificial Intelligence, Machine Learning and Deep LearningArtificial Intelligence, Machine Learning and Deep Learning
Artificial Intelligence, Machine Learning and Deep Learning
Sujit Pal
 
A brief introduction to recent segmentation methods
A brief introduction to recent segmentation methodsA brief introduction to recent segmentation methods
A brief introduction to recent segmentation methods
Shunta Saito
 
Neural networks
Neural networksNeural networks
Neural networks
HarshitGupta367
 
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
Universitat Politècnica de Catalunya
 
Vector-Based Back Propagation Algorithm of.pdf
Vector-Based Back Propagation Algorithm of.pdfVector-Based Back Propagation Algorithm of.pdf
Vector-Based Back Propagation Algorithm of.pdf
Nesrine Wagaa
 
Wits presentation 6_28072015
Wits presentation 6_28072015Wits presentation 6_28072015
Wits presentation 6_28072015
Beatrice van Eden
 
CNN_AH.pptx
CNN_AH.pptxCNN_AH.pptx
CNN_AH.pptx
ssuserc755f1
 
CNN_AH.pptx
CNN_AH.pptxCNN_AH.pptx
CNN_AH.pptx
ssuserc755f1
 

Similar to Convolutional Neural Networks (DLAI D5L1 2017 UPC Deep Learning for Artificial Intelligence) (20)

Deep Learning
Deep LearningDeep Learning
Deep Learning
 
convolutional_neural_networks in deep learning
convolutional_neural_networks in deep learningconvolutional_neural_networks in deep learning
convolutional_neural_networks in deep learning
 
intro-to-cnn-April_2020.pptx
intro-to-cnn-April_2020.pptxintro-to-cnn-April_2020.pptx
intro-to-cnn-April_2020.pptx
 
build a Convolutional Neural Network (CNN) using TensorFlow in Python
build a Convolutional Neural Network (CNN) using TensorFlow in Pythonbuild a Convolutional Neural Network (CNN) using TensorFlow in Python
build a Convolutional Neural Network (CNN) using TensorFlow in Python
 
Deep learning (2)
Deep learning (2)Deep learning (2)
Deep learning (2)
 
Convolutional neural network
Convolutional neural networkConvolutional neural network
Convolutional neural network
 
Convolutional Neural Networks
Convolutional Neural NetworksConvolutional Neural Networks
Convolutional Neural Networks
 
Cnn
CnnCnn
Cnn
 
Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)
 
PixelCNN, Wavenet, Normalizing Flows - Santiago Pascual - UPC Barcelona 2018
PixelCNN, Wavenet, Normalizing Flows - Santiago Pascual - UPC Barcelona 2018PixelCNN, Wavenet, Normalizing Flows - Santiago Pascual - UPC Barcelona 2018
PixelCNN, Wavenet, Normalizing Flows - Santiago Pascual - UPC Barcelona 2018
 
Network Deconvolution review [cdm]
Network Deconvolution review [cdm]Network Deconvolution review [cdm]
Network Deconvolution review [cdm]
 
lec6a.ppt
lec6a.pptlec6a.ppt
lec6a.ppt
 
Artificial Intelligence, Machine Learning and Deep Learning
Artificial Intelligence, Machine Learning and Deep LearningArtificial Intelligence, Machine Learning and Deep Learning
Artificial Intelligence, Machine Learning and Deep Learning
 
A brief introduction to recent segmentation methods
A brief introduction to recent segmentation methodsA brief introduction to recent segmentation methods
A brief introduction to recent segmentation methods
 
Neural networks
Neural networksNeural networks
Neural networks
 
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
 
Vector-Based Back Propagation Algorithm of.pdf
Vector-Based Back Propagation Algorithm of.pdfVector-Based Back Propagation Algorithm of.pdf
Vector-Based Back Propagation Algorithm of.pdf
 
Wits presentation 6_28072015
Wits presentation 6_28072015Wits presentation 6_28072015
Wits presentation 6_28072015
 
CNN_AH.pptx
CNN_AH.pptxCNN_AH.pptx
CNN_AH.pptx
 
CNN_AH.pptx
CNN_AH.pptxCNN_AH.pptx
CNN_AH.pptx
 

More from Universitat Politècnica de Catalunya

Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Universitat Politècnica de Catalunya
 
Deep Generative Learning for All
Deep Generative Learning for AllDeep Generative Learning for All
Deep Generative Learning for All
Universitat Politècnica de Catalunya
 
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
Universitat Politècnica de Catalunya
 
Towards Sign Language Translation & Production | Xavier Giro-i-Nieto
Towards Sign Language Translation & Production | Xavier Giro-i-NietoTowards Sign Language Translation & Production | Xavier Giro-i-Nieto
Towards Sign Language Translation & Production | Xavier Giro-i-Nieto
Universitat Politècnica de Catalunya
 
The Transformer - Xavier Giró - UPC Barcelona 2021
The Transformer - Xavier Giró - UPC Barcelona 2021The Transformer - Xavier Giró - UPC Barcelona 2021
The Transformer - Xavier Giró - UPC Barcelona 2021
Universitat Politècnica de Catalunya
 
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
Universitat Politècnica de Catalunya
 
Open challenges in sign language translation and production
Open challenges in sign language translation and productionOpen challenges in sign language translation and production
Open challenges in sign language translation and production
Universitat Politècnica de Catalunya
 
Generation of Synthetic Referring Expressions for Object Segmentation in Videos
Generation of Synthetic Referring Expressions for Object Segmentation in VideosGeneration of Synthetic Referring Expressions for Object Segmentation in Videos
Generation of Synthetic Referring Expressions for Object Segmentation in Videos
Universitat Politècnica de Catalunya
 
Discovery and Learning of Navigation Goals from Pixels in Minecraft
Discovery and Learning of Navigation Goals from Pixels in MinecraftDiscovery and Learning of Navigation Goals from Pixels in Minecraft
Discovery and Learning of Navigation Goals from Pixels in Minecraft
Universitat Politècnica de Catalunya
 
Learn2Sign : Sign language recognition and translation using human keypoint e...
Learn2Sign : Sign language recognition and translation using human keypoint e...Learn2Sign : Sign language recognition and translation using human keypoint e...
Learn2Sign : Sign language recognition and translation using human keypoint e...
Universitat Politècnica de Catalunya
 
Intepretability / Explainable AI for Deep Neural Networks
Intepretability / Explainable AI for Deep Neural NetworksIntepretability / Explainable AI for Deep Neural Networks
Intepretability / Explainable AI for Deep Neural Networks
Universitat Politècnica de Catalunya
 
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
Universitat Politècnica de Catalunya
 
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
Universitat Politècnica de Catalunya
 
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
Universitat Politècnica de Catalunya
 
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
Universitat Politècnica de Catalunya
 
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
Universitat Politècnica de Catalunya
 
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
Universitat Politècnica de Catalunya
 
Curriculum Learning for Recurrent Video Object Segmentation
Curriculum Learning for Recurrent Video Object SegmentationCurriculum Learning for Recurrent Video Object Segmentation
Curriculum Learning for Recurrent Video Object Segmentation
Universitat Politècnica de Catalunya
 
Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020
Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020
Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020
Universitat Politècnica de Catalunya
 
Deep Learning Representations for All - Xavier Giro-i-Nieto - IRI Barcelona 2020
Deep Learning Representations for All - Xavier Giro-i-Nieto - IRI Barcelona 2020Deep Learning Representations for All - Xavier Giro-i-Nieto - IRI Barcelona 2020
Deep Learning Representations for All - Xavier Giro-i-Nieto - IRI Barcelona 2020
Universitat Politècnica de Catalunya
 

More from Universitat Politècnica de Catalunya (20)

Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
Deep Generative Learning for All
Deep Generative Learning for AllDeep Generative Learning for All
Deep Generative Learning for All
 
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
 
Towards Sign Language Translation & Production | Xavier Giro-i-Nieto
Towards Sign Language Translation & Production | Xavier Giro-i-NietoTowards Sign Language Translation & Production | Xavier Giro-i-Nieto
Towards Sign Language Translation & Production | Xavier Giro-i-Nieto
 
The Transformer - Xavier Giró - UPC Barcelona 2021
The Transformer - Xavier Giró - UPC Barcelona 2021The Transformer - Xavier Giró - UPC Barcelona 2021
The Transformer - Xavier Giró - UPC Barcelona 2021
 
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
 
Open challenges in sign language translation and production
Open challenges in sign language translation and productionOpen challenges in sign language translation and production
Open challenges in sign language translation and production
 
Generation of Synthetic Referring Expressions for Object Segmentation in Videos
Generation of Synthetic Referring Expressions for Object Segmentation in VideosGeneration of Synthetic Referring Expressions for Object Segmentation in Videos
Generation of Synthetic Referring Expressions for Object Segmentation in Videos
 
Discovery and Learning of Navigation Goals from Pixels in Minecraft
Discovery and Learning of Navigation Goals from Pixels in MinecraftDiscovery and Learning of Navigation Goals from Pixels in Minecraft
Discovery and Learning of Navigation Goals from Pixels in Minecraft
 
Learn2Sign : Sign language recognition and translation using human keypoint e...
Learn2Sign : Sign language recognition and translation using human keypoint e...Learn2Sign : Sign language recognition and translation using human keypoint e...
Learn2Sign : Sign language recognition and translation using human keypoint e...
 
Intepretability / Explainable AI for Deep Neural Networks
Intepretability / Explainable AI for Deep Neural NetworksIntepretability / Explainable AI for Deep Neural Networks
Intepretability / Explainable AI for Deep Neural Networks
 
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
 
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
 
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
 
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
 
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
 
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
 
Curriculum Learning for Recurrent Video Object Segmentation
Curriculum Learning for Recurrent Video Object SegmentationCurriculum Learning for Recurrent Video Object Segmentation
Curriculum Learning for Recurrent Video Object Segmentation
 
Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020
Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020
Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020
 
Deep Learning Representations for All - Xavier Giro-i-Nieto - IRI Barcelona 2020
Deep Learning Representations for All - Xavier Giro-i-Nieto - IRI Barcelona 2020Deep Learning Representations for All - Xavier Giro-i-Nieto - IRI Barcelona 2020
Deep Learning Representations for All - Xavier Giro-i-Nieto - IRI Barcelona 2020
 

Recently uploaded

Business update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMIBusiness update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMI
AlejandraGmez176757
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Subhajit Sahu
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
ewymefz
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
MaleehaSheikh2
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
haila53
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
nscud
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
Oppotus
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Linda486226
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
TravisMalana
 
tapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive datatapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive data
theahmadsaood
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
ewymefz
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
ocavb
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
axoqas
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
Subhajit Sahu
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
ewymefz
 
Jpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization SampleJpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization Sample
James Polillo
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
enxupq
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
ewymefz
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
ewymefz
 

Recently uploaded (20)

Business update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMIBusiness update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMI
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
 
tapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive datatapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive data
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
 
Jpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization SampleJpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization Sample
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
 

Convolutional Neural Networks (DLAI D5L1 2017 UPC Deep Learning for Artificial Intelligence)

  • 1. [course  site]   Verónica Vilaplana veronica.vilaplana@upc.edu Associate Professor Universitat Politecnica de Catalunya Technical University of Catalonia Convolutional Neural Networks Day 5 Lecture 1 #DLUPC
  • 2. Index   •  Mo-va-on:   •  Local  connec-vity   •  Parameter  sharing   •  Pooling  and  subsampling   •  Layers   •  Convolu-onal   •  Pooling   •  Fully  connected   •  Ac-va-on  func-ons   •  Batch  normaliza-on   •  Upsampling   •  Examples   2  
  • 3. Mo)va)on   Local  connec-vity   Parameter  sharing   Pooling  and  subsampling     3  
  • 4. Neural  networks  for  visual  data   •  Example:  Image  recogni)on   •  Given  some  input  image,  iden-fy  which  object  it  contains       4   sun  flower   Caltech101  dataset   image  size  150x112   neurons  connected  to  16800  inputs  
  • 5. Neural  networks  for  visual  data     •  We  can  design  neural  networks  that  are  specifically  adapted  for  such  problems   •  must  deal  with  very  high-­‐dimensional  inputs   •  150  x  112  pixels  =  16800  inputs,  or  3  x  16800  if  RGB  pixels   •  can  exploit  the  2D  topology  of  pixels  (or  3D  for  video  data)   •  can  build  in  invariance  to  certain  varia-ons  we  can  expect   •  transla-ons,  illumina-on,  etc.   •  Convolu)onal  networks  are  a  specialized  kind  of  neural  network  for  processing  data   that  has  a  known,  grid-­‐like  topology.  They  leverage  these  ideas:   •  local  connec-vity   •  parameter  sharing   •  pooling  /  subsampling  hidden  units   5  S.  Credit:  H.  Larochelle  
  • 6. Convolu)onal  neural  networks   Local  connec)vity   •  First  idea:  use  a  local  connec-vity  of  hidden  units   •  each  hidden  unit  is  connected  only  to  a                subregion  (patch)  of  the  input  image:  recep)ve  field   •  it  is  connected  to  all  channels   •  1  if  greyscale  image   •  3  (R,  G,  B)  for  color  image   •  …   •  Solves  the  following  problems:   •  fully  connected  hidden  layer  would  have                an  unmanageable  number  of  parameters   •  compu-ng  the  linear  ac-va-ons  of  the                    hidden  units  would  be  very  expensive   6   S.  Credit:  H.  Larochelle  
  • 7. Convolu)onal  neural  networks   Parameter  sharing   •  Second  idea:  share  matrix  of  parameters  across  certain  units   •  units  organized  into  the  same  ‘‘feature  map’’  share  parameters   •  hidden  units  within  a  feature  map  cover  different  posi-ons  in  the  image   •  Solves  the  following  problems:   •  reduces  even  more  the  number  of  parameters   •  will  extract  the  same  features  at  every  posi-on  (features  are  ‘‘equivariant’’)   7   Wij  is  the  matrix   connec-ng  the   ith  input  channel   with  the  jth   feature  map   S.  Credit:  H.  Larochelle  
  • 8. Convolu)onal  neural  networks   Parameter  sharing     •  Each  feature  map  forms  a  2D  grid  of  features   •  can  be  computed  with  a  discrete  convolu)on  of  a  kernel  matrix  kij      which  is  the  hidden  weights  matrix  Wij  with  its  rows  and  columns  flipped   8   Convolu-ons   Input  image   Input   Feature  Map   ...   yj = f ( kij ∗ xi ) i ∑ xi  ith  channel  of  input,  yj  hidden  layer   S.  Credit:  H.  Larochelle  
  • 9. Convolu)onal  neural  networks   •  Convolu-on  as  feature  extrac-on          (applying  a  filterbank)   •  but  filters  are  learned   9   Input   Feature  Map   . . . S.  Credit:  H.  Larochelle  
  • 10. Convolu)onal  neural  networks   Pooling  and  subsampling   •  Third  idea:  pool  hidden  units  in  same  neighborhood   •  pooling  is  performed  in  non-­‐overlapping  neighborhoods  (subsampling)   •  an  alterna-ve  to  ‘‘max’’  pooling  is  ‘‘average’’  pooling   •  pooling  reduces  dimensionality  and  provides  invariance  to  small  local  changes   10   Max   S.  Credit:  H.  Larochelle  
  • 11. Convolu)onal  Neural  Networks   •  Convolu)onal  neural  networks  alternate  between  convolu)onal  layers  (followed  by   a  nonlinearity)  and  pooling  layers  (basic  architecture)   •  For  recogni-on:  output  layer  is  a  regular,  fully  connected  layer  with  sohmax  non-­‐ linearity   •  output  provides  an  es-mate  of  the  condi-onal  probability  of  each  class   •  The  network  is  trained  by  stochas-c  gradient  descent  (&  variants)   •  backpropaga-on  is  used  similarly  as  in  a  fully  connected  network   11  S.  Credit:  H.  Larochelle   train  the  weights  of  filters  
  • 12. Convolu)onal  Neural  Networks   12   ‘Clasic’  architecture   for  object  recogni-on   (BoW):   CNN=  learning   hierarchical   representa)ons  with   increasing  levels  of   abstrac)on     End-­‐to-­‐end  training:   joint  op-miza-on  of   features  and  classifier   Feature  visualiza-on  of  convolu-onal  net  trained  on  ImageNet  (Zeiler,  Fergus,  2013)  
  • 13. Example:  LeNet-­‐5     13   •  LeCun  et  al.,  1998   MNIST  digit  classifica-on  problem   handwrilen  digits   60,000  training  examples   10,000  test  samples   10  classes   28x28  grayscale  images   Y.  LeCun,  L.  Bolou,  Y.  Bengio,  and  P.  Haffner,  Gradient-­‐based  learning  applied  to  document  recogni-on,  Proc.  IEEE  86(11):  2278–2324,  1998.   Conv  filters  were  5x5,  applied  at  stride  1   Sigmoid  or  tanh  nonlinearity   Subsampling  (average  pooling)  layers  were  2x2  applied  at  stride  2   Fully  connected  layers  at  the  end   i.e.  architecture  is  [CONV-­‐POOL-­‐CONV-­‐POOL-­‐CONV-­‐FC]  
  • 15. Convolu)onal  Neural  Networks   15   A  regular  3-­‐layer  Neural  Network A  ConvNet  with  3  layers   •  In  ConvNets  inputs  are  ‘images’  (architecture  is  constrained)   •  A  ConvNet  arranges  neurons  in  three  dimensions  (width,  height,  depth)   •  Every  layer  transforms  3D  input  volume  to  a  3D  output  volume  of  neuron   ac-va-ons   S.  Credit:  Stanford  cs231_2017    
  • 16. Convolu)onal  layer   •  Convolu-on  on  a  2D  grid   16  Image  source  
  • 17. Convolu)onal  layer   17   32x32x3  input   5x5x3  filter   Filters  always  extend  the  full  detph   of  the  input  volume   Convolve  the  filter  with  the  input   i.e.  slide  over  the  input  spa-ally,   compu-ng  dot  products   depth   width   height   S.  Credit:  Stanford  cs231_2017     •  Convolu-on  on  a  volume  
  • 18. Convolu)onal  layer   18   32x32x3  input   5x5x3  filter  w Ac-va-on  map  or   feature  map   Convolve  (slide)  over  all   spa-al  loca-ons   1  number:   the  result  of  taking  a  dot  product  between  the  filter  and  a   small  5x5x3  chunk  of  the  input:  5x5x3=75-­‐dim  dot  product   +  bias            wtx+b S.  Credit:  Stanford  cs231_2017     •  Convolu-on  on  a  volume  
  • 19. Convolu)onal  layer   19   32x32x3  input   5x5x3  filter  w Ac-va-on  maps   Convolve  (slide)  over  all   spa-al  loca-ons   Consider  a  second,  green  filter   S.  Credit:  Stanford  cs231_2017    
  • 20. Convolu)onal  layer   20   Ac-va-on  maps  of   feature  maps   Convolu-on  Layer   We  stack  the  maps  up  to  get  a  ‘new  volume’  of  size  28x28x6 If  we  have  6  5x5x3  filters,  we  get  6  separate  ac-va-on  maps So  applying  a  filterbank  to  an  input  (3D  matrix)  yields  a  cube-­‐like  output,  a   3D  matrix  in  which  each  slice  is  an  output  of  convolu-on  with  one  filter.   S.  Credit:   Stanford   cs231_2017    
  • 21. Convolu)onal  layer   21   ConvNet  is  a  sequence  of  Convolu-onal  Layers,  interspersed  with  ac-va-on  func-ons   and  pooling  layers  (and  a  small  number  of  fully  connected  layers) We  add  more  layers  of  filters.  We  apply  filters  (convolu-ons)  to  the  output  volume  of   the  previous  layer.  The  result  of  each  convolu-on  is  a  slice  in  the  new  volume.   S.  Credit:  Stanford  cs231_2017    
  • 22. Example:  filters  and  ac)va)on  maps   22   Example  CNN  trained  for  image  recogni-on  on   CIFAR  dataset     The  network  learns  features  that  ac-vate   when  they  see  some  specific  type  of  feature  at   some  spa-al  posi-on  in  the  input.  Stacking   ac-va-on  maps  for  all  filters  along  depth   dimension  forms  the  full  output  volume   hlp://cs.stanford.edu/people/karpathy/convnetjs/demo/cifar10.html  
  • 23. Convolu)on   23   7x7  input  (spa-ally)     assume  3x3  filter:  5x5  output   (stride  1)   Hyperparameters:  filter  spa-al  extent  F,  stride  S,  padding  P   Stride  is  the  number  of  pixels  by  which  we  slide  the  filter  matrix  over  the  input  matrix.   Larger  stride  will  produce  smaller  feature  maps.       7   7   7   7   7x7  input  (spa-ally)     assume  3x3  filter     applied  with  stride  2:    3x3  output S.  Credit:  Stanford  cs231_2017    
  • 24. Convolu)on   24   N   N   Output  size:  (N-­‐F)/  stride  +  1   e.g.  N=7,  F=3   stride  1:  (7-­‐3)/1+1  =  5   stride  2:  (7-­‐3)/2+1  =  3     stride  3:  (7-­‐3)/3+1  =  2.33        not  applied     E.g.  N=7,  F=3  ,  stride  1,  pad  with  1  pixel   border:  output  size  7x7     In  general,  common  to  see  CONV  layers   with  stride  1,  filters  FxF,  zero-­‐padding  with   P=  (F-­‐1)/2:  preserve  size  spa)ally   common:     zero-­‐padding   in  the  border   S.  Credit:  Stanford  cs231_2017     Hyperparameters:  filter  spa-al  extent  F,  stride  S,  padding  P  
  • 25. 1x1  convolu)ons   25   1x1  conv  with  32  filters   1x1  convolu-on  layers:  used  to  reduce  dimensionality  (number  of  feature  maps) each  filter  has  size  1x1x64   and  performs  a  64-­‐dimensional   dot  product   S.  Credit:  Stanford  cs231_2017    
  • 26. Example:  size,  parameters   26   Input  volume:  32x32x3   10  5x5  filters  with  stride  1,  pad  2   S.  Credit:  Stanford  cs231_2017     Output  volume  size:     (  N  +  2P  –  F  )  /  S  +  1   (32+2*2-­‐5)/1+1=  32  spa-ally  so    32x32x10         Number  of  parameters  in  this  layer:   each  filter  has  5x5x3  +  1  =76  params  (  +1  for  bias)   -­‐>    76x10  =  760  parameters  
  • 27. Summary:  conv  layer   To  summarize,  the  Conv  Layer   •  Accepts  a  volume  of  size  W1  x  H1  x  D1   •  Requires  four  hyperparameters   •  number  of  filters  K   •  their  spa)al  extent  F   •  the  stride  S   •  the  amount  of  zero  padding  P   •  Produces  a  volume  of  size  W2  x  H2  x  D2   •  W2  =  (W1  –  F  +  2P)  /  S  +  1   •  H2  =  (H1  –  F  +  2P)  /  S  +  1   •  D2  =  K   •  With  parameter  sharing,  it  introduces  F.F.D1  weights  per  filter,  for  a  total  of  (F.F.D1).K   weights  and  K  biases   •  In  the  output  volume,  the  dth  depth  slice  (of  size  W2xH2)  is  the  result  of  performing  a  valid   convolu-on  of  the  dth  filter  over  the  input  volume  with  a  stride  of  S,  and  then  offset  by  dth  bias   27   Common  sevngs:   K  =  powers  of  2  (32,64,128,256)   F=3,  S=1,  P=1   F=5,  S=1,  P=2   F=5,  S=2,  P=?  (whatever  fits)   F=1,  S=1,  P=0   S.  Credit:  Stanford  cs231_2017    
  • 28. Ac)va)on  func)ons   •  Desirable  proper)es:  mostly  smooth,  con-nuous,  differen-able,  fairly  linear   28   Sigmoid             tanh           ReLu           Leaky  ReLu           Maxout         ELU   tanh x = ex − e−x ex + e−x σ (x) = 1 1+ e−x max(0.1x,x) max(0,x) max(w1 T x + b1 ,w2 T x + b2 )
  • 29. Ac)va)on  func)ons   29   •  Example  
  • 30. Pooling  layer   •  Makes  the  representa-ons  smaller  and  more  manageable  for  later  layers   •  Useful  to  get  invariance  to  small  local  changes   •  Operates  over  each  ac-va-on  map  independently   30  S.  Credit:  Stanford  cs231_2017    
  • 31. Pooling  layer   •  Max  pooling   •  Other  pooling  func-ons:  average  pooling  or  L2-­‐norm  pooling   31   y   x   single  depth  slice   max  pool  with  2x2   filters  and  stride  2   S.  Credit:  Stanford  cs231_2017    
  • 32. Pooling  layer   •  Example   32  
  • 33. Summary:  pooling  layer   To  summarize,  the  pooling  layer   •  Accepts  a  volume  of  size  W1xH1xD1   •  Requires  two  hyperparameters   •  the  spa)al  extent  F   •  the  stride  S   •  Produces  a  volume  of  size  W2xH2xD2,  where   •  W2  =  (W1-­‐F)  /  S  +  1   •  H2  =  (H1-­‐F)  /  S  +  1   •  D2  =  D1   •  Introduces  zero  parameters  since  it  computes  a  fixed  func-on  of  the  input   33   Common  sevngs:   F=2,  S=2   F=3,  S=2   S.  Credit:  Stanford  cs231_2017    
  • 34. Fully  connected  layer   •  In  the  end  it  is  common  to  add  one  or  more  fully  (or  densely  )  connected  layers.     •  Every  neuron  in  the  previous  layer  is  connected  to  every  neuron  in  the  next  layer  (as   in  regular  neural  networks).  Ac-va-on  is  computed  as  matrix  mul-plica-on  plus  bias   •  At  the  output,  sohmax  ac-va-on  for  classifica-on   34   connec-ons  and  weights   not  shown  here   4  possible   outputs  
  • 35. Fully  connected  layers  and  Convolu)onal  layers     •  A  convolu-onal  layer  can  be  implemented  as  a  fully  connected  layer   •  The  weight  matrix  is  a  large  matrix  that  is  mostly  zero  except  for  certain   blocks  (due  to  local  connec-vity)   35   I  input  image  4x4    vectorized  to  16x1   O  output  image  4x1  (later  reshaped  2x2)   h  3x3  kernel;    C  16x4  (weights)  
  • 36. Fully  connected  layers  and  Convolu)onal  layers   36  
  • 37. Fully  connected  layers  and  Convolu)onal  layers   •  Fully  connected  layers  can  also  be  viewed  as  convolu-ons  with  kernels  that   cover  the  en-re  input  region   •  Example:      A  fully  convolu-onal  layer  with  K=4096  neurons,  input  volume    32x32x512    can  be  expressed  as  a  convolu-onal  layer  with  F=32  (size  of  kernel),  P=0,  S=1    K=4096  (number  of  filters)    the  filter  size  is  exactly  the  size  of  the  input  volume    the  output  is  1x1x4096     37  
  • 38. Batch  normaliza)on  layer   •  As  learning  progresses,  the  distribu-on  of  the  layer  inputs  changes  due   to  parameter  updates  (  internal  covariate  shih)   •  This  can  result  in  most  inputs  being  in  the  non-­‐linear  regime  of      the  ac-va-on  func-on,  slowing  down  learning     •  Bach  normaliza-on  is  a  technique  to  reduce  this  effect   •  Explicitly  force  the  layer  ac-va-ons  to  have  zero  mean  and  unit  variance   w.r.t  running  batch  es-mates       •  Adds  a  learnable  scale  and  bias  term  to  allow  the  network  to  s-ll  use  the   nonlinearity   38    Ioffe  and  Szegedy,  2015.  “Batch  normaliza-on:  accelera-ng  deep  network  training  by  reducing  internal  covariate  shih”   FC  /  Conv   Batch  norm   ReLu   FC  /  Conv   Batch  norm   ReLu   ˆx(k ) = x(k ) − E(x(k ) ) var(x(k ) ) y(k ) = γ (k ) ˆx(k ) + β(k )
  • 39. Upsampling  layers:  recovering  spa)al  shape   •  Mo-va-on:  seman-c  segmenta-on.  Make  predic-ons  for  all  pixels  at  once   39   Problem:  convolu-ons  at  original  image  resolu-on  will  be  very  expensive   S.  Credit:  Stanford  cs231_2017    
  • 40. Upsampling  layers:  recovering  spa)al  shape   •  Mo-va-on:  seman-c  segmenta-on.  Make  predic-ons  for  all  pixels  at  once   40   Design  a  network  as  a  sequence  of  convolu-onal  layers  with  downsampling  and  upsampling     Other  applica)ons:  super-­‐resolu)on,  flow  es)ma)on,  genera)ve  modeling   S.  Credit:  Stanford  cs231_2017    
  • 41. Learnable  upsampling   •  Recall:  3x3  convolu-on,  stride  1,  pad  1   41   S.  Credit:  Stanford  cs231_2017    
  • 42. Learnable  upsampling   •  Recall:  3x3  convolu-on,  stride  2,  pad  1   42   Filter  moves  2  pixels  in  the  input   for  every  one  pixel  in  the  output     Stride  gives  ra-o  between   movement  in  input  and  output   S.  Credit:  Stanford  cs231_2017    
  • 43. Learnable  upsampling:  transposed  convolu)on   •  3x3  transposed  convolu-on,  stride  2,  pad1     43   Sum  where   output  overlaps   Filter  moves  2  pixels  in  the  output   for  every  one  pixel  in  the  input     Stride  gives  ra-o  between   movement  in  input  and  output   S.  Credit:  Stanford  cs231_2017    
  • 44. Learnable  upsampling:  transposed  convolu)on   •  1D  example   44   More info: Dumoulin et al, A guide to convolution arithmetic for deep learning, 2016 https://github.com/vdumoulin/conv_arithmetic Shi, Is the deconvolution layer the same as a convolutional layer?, 2016 Other  names   -­‐transposed  convolu-on   -­‐backward  strided  convolu-on,     -­‐frac-onally  strided  convolu-on,     -­‐upconvolu-on,   -­‐“deconvolu-on”   S.  Credit:  Stanford  cs231_2017    
  • 45. Learnable  upsampling       It  is  always  possible  to  emulate  a  transposed  convolu-on  with  a  direct  convolu-on  (with  frac-onal  stride)   45   More info: Dumoulin et al, A guide to convolution arithmetic for deep learning, 2016 https://github.com/vdumoulin/conv_arithmetic
  • 46. Learnable  upsampling   46   1D Convolution with stride 2 x input size 8 y output size 5 filter size 4 1D Transposed Convolution with stride 2 1D Subpixel convolution with stride 1/2 Transposed  convolu-on  vs  frac-onal  strided  convolu-on:    the  two  operators  can  achieve  the  same  result   if  the  filters  are  learned.   Shi, Is the deconvolution layer the same as a convolutional layer?, 2016 1D example:
  • 47. Dilated  convolu)ons   •  Systema-cally  aggregate  mul-scale  contextual  informa-on  without  losing  resolu-on   Usual  convolu-on                      Dilated  convolu-on     47   Yu,  Koltun,  Mul--­‐scale  context  aggrega-on  by  dilated  convolu-ons,  2016   (F ∗k)( p) = F( p − t)k(t) t ∑ (F ∗l k)( p) = F( p − lt)k(t) t ∑ Fi+1 = Fi ∗2i ki i = 0,1,2,...n − 2The  recep-ve  field  grows  exponen-ally  as  you  add  more  layers   Number  of  parameters  increases  linearly  as  you  add  more  layers  
  • 48. Some  architectures   for  Visual  Recogni-on   48  
  • 49. ImageNet:  ILSVRC   49   Image  Classifica-on   1000  object  classes  (categories)   Images:   -­‐  1.2  M  train   -­‐  100k  test.   Metric:  top  5  error  rate  (predict  5  classes)   Large  Scale  Visual  Recogni-on  Challenge   www.image-­‐net.org/challenges/LSVRC/  
  • 50. AlexNet  (2012)       •  Similar  framework  to  LeNet:   •  8  parameter  layers  (5  convolu-onal,  3  fully  connected)   •  Max  pooling,  ReLu  nonlineari-es   •  650,000  units,  60  million  parameters)   •  trained  on  two  GPUs  for  a  week   •  Dropout  regulariza-on   50   A.  Krizhevsky,  I.  Sutskever,  and  G.  Hinton,  ImageNet  Classifica-on  with  Deep  Convolu-onal  Neural  Networks,  NIPS  2012  
  • 51. AlexNet  (2012)   51   Full  AlexNet  architecture:   8  layers   [227x227x3]  INPUT   [55x55x96]  CONV1:  96  11x11  filters  at  stride  4,  pad  0     [27x27x96]  MAX  POOL1:  3x3  filters  at  stride  2   [27x27x96]  NORM1:  Normaliza-on  layer   [27x27x256]  CONV2:  256  5x5  filters  at  stride  1,  pad  2     [13x13x256]  MAX  POOL2:  3x3  filters  at  stride  2   [13x13x256]  NORM2:  Normaliza-on  layer   [13x13x384]  CONV3:  384  3x3  filters  at  stride  1,  pad  1     [13x13x384]  CONV4:  384  3x3  filters  at  stride  1,  pad  1     [13x13x256]  CONV5:  256  3x3  filters  at  stride  1,  pad  1     [6x6x256]  MAX  POOL3:  3x3  filters  at  stride  2   [4096]  FC6:  4096  neurons   [4096]  FC7:  4096  neurons   [1000]  FC8:  1000  neurons  (class  scores)   Details/Retrospec)ves:     -­‐  first  use  of  ReLU   -­‐  used  Norm  layers  (not  common)   -­‐  heavy  data  augmenta-on   -­‐  dropout  0.5   -­‐  batch  size  128   -­‐  SGD  Momentum  0.9   -­‐  Learning  rate  1e-­‐2,  reduced  by  10  manually   when  val  accuracy  plateaus   -­‐  L2  weight  decay  5e-­‐4   ILSVRC  2012  winner   7  CNN  ensemble:  18.2%  -­‐>  15.4%  
  • 52. AlexNet  (2012)   •  Visualiza-on  of  the  96  11x11  filters  learned  by  the  first  layer   52   A.  Krizhevsky,  I.  Sutskever,  and  G.  Hinton,  ImageNet  Classifica-on  with  Deep  Convolu-onal  Neural  Networks,  NIPS  2012  
  • 53. VGGNet-­‐16  (2014)   53  K.  Simonyan  and  A.  Zisserman,  Very  Deep  Convolu-onal  Networks  for  Large-­‐Scale  Image  Recogni-on,  ICLR  2015   Seq.  of  deeper  nets  trained  progressively   Large  recep-ve  fields  replaced  by  3x3  conv     Only:   3x3  CONV  stride  1,  pad  1  and   2x2  MAX  POOL  stride  2     Shows  that  depth  is  a  cri-cal  component     for  good  performance     TOTAL  memory:  24M  *  4  bytes  ~=     93MB  /  image    (only  forward!  ~*2  for  bwd)   most  memory  is  in  the  early  CONV     TOTAL  params:  138M  parameters   most  parameters  are  in  late  FC  
  • 54. GoogLeNet  (2014)   54   -­‐  Incep-on  module  that  drama-cally  reduced   the  number  of  parameters   -­‐  Uses  average  pooling  instead  of  FC     Compared  to  AlexNet   12x  less  parameters  (5M  vs  60M)   2x  more  compute  6.67%  (vs  16,4%)   22  layers   Convolu-on   Pooling   Sohmax   Other    C.  Szegedy  et  al.,  Going  deeper  with  convolu-ons,  CVPR  2015   9  incep-on  modules:  Network  in  a  network…  
  • 55. GoogLeNet  (2014)   •  The  Incep-on  Module   •  Parallel  paths  with  different  recep-ve  field  sizes  and  opera-ons  are  meant  to  capture   sparse  palerns  of  correla-ons  in  the  stack  of  feature  maps   •  Use  1x1  convolu-ons  for  dimensionality  reduc-on  before  expensive  convolu-ons   55    C.  Szegedy  et  al.,  Going  deeper  with  convolu-ons,  CVPR  2015  
  • 56. GoogLeNet  (2014)   56   Auxiliary  classifier    C.  Szegedy  et  al.,  Going  deeper  with  convolu-ons,  CVPR  2015   •  Two  Sohmax  Classifiers  at  intermediate  layers  combat  the  vanishing  gradient  while   providing  regulariza-on  at  training  -me.   ...and  no  fully  connected  layers  needed  !   Convolu-on   Pooling   Sohmax   Other  
  • 57. Incep)on  v2,  v3,  v4   •  Regularize  training  with  batch  normaliza-on,  reducing  importance  of  auxiliary   classifiers   •  More  variants  of  incep-on  modules  with  aggressive  factoriza-on  of  filters   57    C.  Szegedy  et  al.,  Rethinking  the  incep-on  architecture  for  computer  vision,  CVPR  2016    C.  Szegedy  et  al.,  Incep-on-­‐v4,  Incep-on-­‐ResNet  and  the  Impact  of  Residual  Connec-ons  on  Learning,  arXiv  2016  
  • 58. ResNet  (2015)   58   ILSVRC  2015  winner  (3.6%  top  5  error)     MSRA:  ILSVRC  &  COCO  2015  compe--ons   -­‐  ImageNet  Classifica-on:  “ultra  deep”,  152-­‐leyers   -­‐  ImageNet  Detec-on:  16%  beler  than  2nd   -­‐  ImageNet  Localiza-on:  27%  beler  than  2nd   -­‐  COCO  Detec-on:  11%  beler  than  2nd   -­‐  COCO  Segmenta-on:  12%  beler  than  2nd   2-­‐3  weeks  of  training  on  8  GPU  machine   at  run-me:  faster  than  a  VGGNet!   (even  though  it  has  8x  more  layers)   spa-al   dimension   only  56x56!   Kaiming  He,  Xiangyu  Zhang,  Shaoqing  Ren,  and  Jian  Sun,  Deep  Residual  Learning  for  Image  Recogni-on,   CVPR  2016  (Best  Paper)  
  • 59. ResNet  (2015)   59   Plain  net   -­‐  Batch  Normaliza-on  aher  every  CONV  layer   -­‐  Xavier/2  ini-aliza-on  from  He  et  al.   -­‐  SGD  +  Momentum  (0.9)   -­‐  Learning  rate:  0.1,  divided  by  10  when  valida-on   error  plateaus   -­‐  Mini-­‐batch  size  256   -­‐  Weight  decay  of  1e-­‐5   -­‐  No  dropout  used   Residual  net   Residual  learning:  reformulate  the  layers  as   learning  residual  func)ons  with  reference  to  the   layer  inputs,  instead  of  learning  unreferenced   func-ons     -­‐  Introduces  skip  or  shortcut  connec-ons  (exis-ng   before  in  the  literature)   -­‐  Make  it  easy  for  net  layers  to  represent  the   iden-ty  mapping     Kaiming  He,  Xiangyu  Zhang,  Shaoqing  Ren,  and  Jian  Sun,  Deep  Residual  Learning  for  Image  Recogni-on,   CVPR  2016  (Best  Paper)  
  • 60. ResNet  (2015)   •  Deeper  residual  module  (bolleneck)   60   ¤  Directly  performing  3x3  convolu-ons  with   256  feature  maps  at  input  and  output:     256  x  256  x  3  x  3  ~  600K  opera-ons   ¤  Using  1x1  convolu-ons  to  reduce  256  to  64   feature  maps,  followed  by  3x3   convolu-ons,  followed  by  1x1  convolu-ons   to  expand  back  to  256  maps:   256  x  64  x  1  x  1  ~  16K   64  x  64  x  3  x  3  ~  36K   64  x  256  x  1  x  1  ~  16K   Total:  ~70K   Kaiming  He,  Xiangyu  Zhang,  Shaoqing  Ren,  and  Jian  Sun,  Deep  Residual  Learning  for  Image  Recogni-on,   CVPR  2016  (Best  Paper)  
  • 61. Deeper  models   61  Kaiming  He,  Xiangyu  Zhang,  Shaoqing  Ren,  and  Jian  Sun,  Deep  Residual  Learning  for  Image  Recogni-on,  CVPR  2016  
  • 62. ILSVRC  2012-­‐2015  summary   62   Team   Year   Place   Error  (top-­‐5)   External  data   SuperVision  –  Toronto   (AlexNet,  7  layers)   2012     -­‐   16.4%   no   SuperVision   2012     1st   15.3%   ImageNet  22k   Clarifai  –  NYU  (7  layers)   2013   -­‐   11.7%   no   Clarifai   2013   1st   11.2%   ImageNet  22k   VGG  –  Oxford  (16  layers)   2014   2nd   7.32%   no   GoogLeNet  (19  layers)   2014   1st   6.67%   no   ResNet  (152  layers)   2015   1st   3.57%   Human  expert*   5.1%   hlp://karpathy.github.io/2014/09/02/what-­‐i-­‐learned-­‐from-­‐compe-ng-­‐against-­‐a-­‐convnet-­‐on-­‐imagenet/  
  • 63. Summary   •  Convolu-onal  neural  networks  are  a  specialized  kind  of  neural  network  for  processing   data  that  has  a  known,  grid-­‐like  topology   •  CNNs  leverage  these  ideas:   •  local  connec-vity   •  parameter  sharing   •  pooling  /  subsampling  hidden  units   •  Layers:  convolu-onal,  non-­‐linear  ac-va-on,  pooling,  upsampling,  batch  normaliza-on   •  Architectures  for  object  recogni-on  in  images   63