SlideShare a Scribd company logo
1 of 14
Memo:	
  Backpropaga.on	
  	
  
in	
  Convolu.onal	
  Neural	
  Network	
Hiroshi	
  Kuwajima	
  
13-­‐03-­‐2014	
  Created	
  
14-­‐08-­‐2014	
  Revised	
  
1 /14
2	
  /14	
  
Note	
n  Purpose	
  
The	
  purpose	
  of	
  this	
  memo	
  is	
  trying	
  to	
  understand	
  and	
  remind	
  the	
  backpropaga.on	
  algorithm	
  in	
  
Convolu.onal	
  Neural	
  Network	
  based	
  on	
  a	
  discussion	
  with	
  Prof.	
  Masayuki	
  Tanaka.	
  	
  
n  Table	
  of	
  Contents	
  
In	
  this	
  memo,	
  backpropaga.on	
  algorithms	
  in	
  different	
  neural	
  networks	
  are	
  explained	
  in	
  the	
  following	
  
order.	
  	
  
	
  
p  Single	
  neuron 	
   	
  3	
  
p  Mul.-­‐layer	
  neural	
  network 	
  5	
  
p  General	
  cases 	
   	
  7	
  
p  Convolu.on	
  layer 	
   	
  9	
  
p  Pooling	
  layer	
   	
   	
  11	
  
p  Convolu.onal	
  Neural	
  Network 	
  13	
  
n  Nota.on	
  
This	
  memo	
  follows	
  the	
  nota.on	
  in	
  UFLDL	
  tutorial	
  (h[p://ufldl.stanford.edu/tutorial)	
  
3	
  /14	
  
Neural	
  Network	
  as	
  a	
  Composite	
  Func4on	
A	
  neural	
  network	
  is	
  decomposed	
  into	
  a	
  composite	
  func.on	
  where	
  each	
  func.on	
  element	
  
corresponds	
  to	
  a	
  differen.able	
  opera.on.	
  	
  
	
  
n  Single	
  neuron	
  (the	
  simplest	
  neural	
  network)	
  example	
  
A	
  single	
  neuron	
  is	
  decomposed	
  into	
  a	
  composite	
  func.on	
  of	
  an	
  affine	
  func.on	
  element	
  parameterized	
  by	
  
W	
  and	
  b	
  and	
  an	
  ac.va.on	
  func.on	
  element	
  	
  f	
  which	
  we	
  choose	
  to	
  be	
  the	
  sigmoid	
  func.on.	
  	
  
	
  
	
  
	
  
	
  
	
  
	
  
Deriva.ves	
  of	
  both	
  affine	
  and	
  sigmoid	
  func.on	
  elements	
  w.r.t.	
  both	
  inputs	
  and	
  parameters	
  are	
  known.	
  
Note	
  that	
  sigmoid	
  func.on	
  does	
  not	
  have	
  neither	
  parameters	
  nor	
  deriva.ves	
  parameters.	
  	
  
Sigmoid	
  func.on	
  is	
  applied	
  element-­‐wise.	
  ‘●’	
  denotes	
  Hadamard	
  product,	
  or	
  element-­‐wise	
  product.	
  	
  
hW,b x( ) = f WT
x + b( )= sigmoid affineW,b x( )( )= sigmoid!affineW,b( ) x( )
∂a
∂z
= a • 1− a( ) where a = hw,b x( ) = sigmoid z( ) =
1
1+ exp −z( )
∂z
∂x
= W,
∂z
∂W
= x,
∂z
∂b
= I where z = affineW,b x( ) = WT
x + b, and I is identity matrix
Decomposi.on	
Neuron	
Standard	
  network	
  representa.on	
x1	
x2	
x3	
+1	
hW,b(x)	
Affine	
   Ac.va.on	
  
(e.g.	
  sigmoid)	
  
Composite	
  func.on	
  representa.on	
x1	
x2	
x3	
+1	
hW,b(x)	
z	
 a
∇W J W,b;x, y( ) =
∂
∂W
J W,b;x, y( ) =
∂J
∂z
∂z
∂W
= δ z( )
xT
∇bJ W,b;x, y( ) =
∂
∂b
J W,b;x, y( ) =
∂J
∂z
∂z
∂b
= δ z( )
4	
  /14	
  
Chain	
  Rule	
  of	
  Error	
  Signals	
  and	
  Gradients	
Error	
  signals	
  are	
  defined	
  as	
  the	
  deriva.ves	
  of	
  any	
  cost	
  func.on	
  J	
  which	
  we	
  choose	
  to	
  be	
  
the	
  square	
  error.	
  Error	
  signals	
  are	
  computed	
  (propagated	
  backward)	
  by	
  the	
  chain	
  rule	
  of	
  
deriva.ve	
  and	
  useful	
  for	
  compu.ng	
  the	
  gradient	
  of	
  the	
  cost	
  func.on.	
  	
  
	
  
n  Single	
  neuron	
  example	
  
Suppose	
  we	
  have	
  m	
  labeled	
  training	
  examples	
  {(x(1),	
  y(1)),	
  …,	
  (x(m),	
  y(m))}.	
  Square	
  error	
  cost	
  func.on	
  for	
  each	
  
example	
  is	
  as	
  follows.	
  Overall	
  cost	
  func.on	
  is	
  the	
  summa.on	
  of	
  cost	
  func.ons	
  over	
  all	
  examples.	
  	
  
	
  
	
  
Error	
  signals	
  of	
  the	
  square	
  error	
  cost	
  func.on	
  for	
  each	
  example	
  are	
  propagated	
  using	
  deriva.ves	
  of	
  
func.on	
  elements	
  w.r.t.	
  inputs.	
  	
  
	
  
	
  
Gradient	
  of	
  the	
  cost	
  func.on	
  w.r.t.	
  parameters	
  for	
  each	
  example	
  is	
  computed	
  using	
  error	
  signals	
  and	
  
deriva.ves	
  of	
  func.on	
  elements	
  w.r.t.	
  parameters.	
  Summing	
  gradients	
  for	
  all	
  examples	
  gets	
  overall	
  
gradient.	
  	
  
δ a( )
=
∂
∂a
J W,b;x, y( ) = − y − a( )
δ z( )
=
∂
∂z
J W,b;x, y( ) =
∂J
∂a
∂a
∂z
= δ a( )
• a • 1− a( )
J W,b;x, y( ) =
1
2
y − hw,b x( )
2
5	
  /14	
  
Decomposi4on	
  of	
  Mul4-­‐Layer	
  Neural	
  Network	
n  Composite	
  func.on	
  representa.on	
  of	
  a	
  mul.-­‐layer	
  neural	
  network	
  
	
  
n  Deriva.ves	
  of	
  func.on	
  elements	
  w.r.t.	
  inputs	
  and	
  parameters	
  
a 1( )
= x, a lmax( )
= hw,b x( )
∂a l+1( )
∂z l+1( ) = a l+1( )
• 1− a l+1( )
( ) where a l+1( )
= sigmoid z l+1( )
( )=
1
1+ exp −z l+1( )
( )
∂z l+1( )
∂a l( ) = W l( )
,
∂z l+1( )
∂W l( ) = a l( )
,
∂z l+1( )
∂b l( ) = I where z l+1( )
= W l( )
( )
T
a l( )
+ b l( )
hW,b x( ) = sigmoid!affineW 2( ),b 2( ) !sigmoid!affineW 1( ),b 1( )( ) x( )
Decomposi.on	
Standard	
  network	
  representa.on	
x1	
x2	
x3	
+1	
Layer	
  1	
+1	
Layer	
  2	
x	
hW,b(x)	
a2
(2)	
a1
(2)	
a3
(2)	
Composite	
  func.on	
  representa.on	
x1	
x2	
x3	
+1	
Affine	
  1	
 Sigmoid	
  1	
x	
z2
(2)	
z1
(2)	
z3
(2)	
+1	
Affine	
  2	
a2
(2)	
a1
(2)	
a3
(2)	
hW,b(x)	
z1
(3)	
 a1
(3)	
a1
(1)	
a2
(1)	
a3
(1)	
Sigmoid	
  2
6	
  /14	
  
Error	
  Signals	
  and	
  Gradients	
  in	
  Mul4-­‐Layer	
  NN	
n  Error	
  signals	
  of	
  the	
  square	
  error	
  cost	
  func.on	
  for	
  each	
  example	
  
n  Gradient	
  of	
  the	
  cost	
  func.on	
  w.r.t.	
  parameters	
  for	
  each	
  example	
  
δ
a l( )
( ) =
∂
∂a l( ) J W,b;x, y( ) =
− y − a l( )
( ) for l = lmax
∂J
∂z l+1( )
∂z l+1( )
∂a l( ) = W l( )
( )
T
δ
z l+1( )
( ) otherwise
⎧
⎨
⎪
⎩
⎪
δ
z l( )
( ) =
∂
∂z l( ) J W,b;x, y( ) =
∂J
∂a l( )
∂a l( )
∂z l( ) = δ
a l( )
( ) • a l( )
• 1− a l( )
( )
∇W l( ) J W,b;x, y( ) =
∂
∂W l( ) J W,b;x, y( ) =
∂J
∂z l+1( )
∂z l+1( )
∂W l( ) = δ
z l+1( )
( ) a l( )
( )
T
∇b l( ) J W,b;x, y( ) =
∂
∂b l( ) J W,b;x, y( ) =
∂J
∂z l+1( )
∂z l+1( )
∂b l( ) = δ
z l+1( )
( )
7	
  /14	
  
Backpropaga4on	
  in	
  General	
  Cases	
1.  Decompose	
  opera.ons	
  in	
  layers	
  of	
  a	
  neural	
  network	
  into	
  func.on	
  elements	
  whose	
  
deriva.ves	
  w.r.t	
  inputs	
  are	
  known	
  by	
  symbolic	
  computa.on.	
  	
  
2.  Backpropagate	
  error	
  signals	
  corresponding	
  to	
  a	
  differen.able	
  cost	
  func.on	
  by	
  
numerical	
  computa.on	
  (Star.ng	
  from	
  cost	
  func.on,	
  plug	
  in	
  error	
  signals	
  backward).	
  	
  
3.  Use	
  backpropagated	
  error	
  signals	
  to	
  compute	
  gradients	
  w.r.t.	
  parameters	
  only	
  for	
  the	
  
func.on	
  elements	
  with	
  parameters	
  where	
  their	
  deriva.ves	
  w.r.t	
  parameters	
  are	
  
known	
  by	
  symbolic	
  computa.on.	
  	
  
4.  Sum	
  gradients	
  over	
  all	
  example	
  to	
  get	
  overall	
  gradient.	
  	
  
hθ x( ) = f lmax( )
!…! fθ l( )
l( )
!…! fθ 2( )
2( )
! f 1( )
( ) x( ) where f 1( )
= x, f lmax( )
= hθ x( ) and ∀l :
∂f l+1( )
∂f l( ) is known
δ l( )
=
∂
∂f l( ) J θ;x, y( ) =
∂J
∂f l+1( )
∂f l+1( )
∂f l( ) = δ l+1( ) ∂f l+1( )
∂f l( ) where
∂J
∂f lmax( ) is known
∇θ l( ) J θ;x, y( ) =
∂
∂θ l( ) J θ;x, y( ) =
∂J
∂f l( )
∂fθ l( )
l( )
∂θ l( ) = δ l( ) ∂fθ l( )
l( )
∂θ l( ) where
∂fθ l( )
l( )
∂θ l( ) is known
∇θ l( ) J θ( ) = ∇θ l( ) J θ;x i( )
, y i( )
( )i=1
m
∑
8	
  /14	
  
Convolu4onal	
  Neural	
  Network	
A	
  convolu.on-­‐pooling	
  layer	
  in	
  Convolu.onal	
  Neural	
  Network	
  is	
  a	
  composite	
  func.on	
  
decomposed	
  into	
  func.on	
  elements	
  f(conv),	
  f(sigm),	
  and	
  f(pool).	
  	
  
Let	
  x	
  be	
  the	
  output	
  from	
  the	
  previous	
  layer.	
  Sigmoid	
  nonlinearity	
  is	
  op.onal.	
  	
  
f pool( )
! f sigm( )
! fw
conv( )
( ) x( )
Convolu.on	
 Sigmoid	
x	
 Pooling	
Forward	
  propaga.on	
 Backward	
  propaga.on	
Convolu.on	
 Sigmoid	
x	
 Pooling
9	
  /14	
  
Deriva4ves	
  of	
  Convolu4on	
n  Discrete	
  convolu.on	
  parameterized	
  by	
  a	
  feature	
  w	
  and	
  its	
  deriva.ves	
  
Let	
  x	
  be	
  the	
  input,	
  and	
  y	
  be	
  the	
  output	
  of	
  convolu.on	
  layer.	
  Here	
  we	
  focus	
  on	
  only	
  one	
  feature	
  vector	
  w,	
  
although	
  a	
  convolu.on	
  layer	
  usually	
  has	
  mul.ple	
  features	
  W	
  =	
  [w1	
  w2	
  …	
  wn].	
  n	
  indexes	
  x	
  and	
  y	
  where	
  	
  
1	
  ≤	
  n	
  ≤	
  |x|	
  for	
  xn,	
  1	
  ≤	
  n	
  ≤	
  |y|	
  =	
  |x|	
  -­‐	
  |w|	
  +	
  1	
  for	
  yn.	
  i	
  indexes	
  w	
  where	
  1	
  ≤	
  i	
  ≤	
  |w|.	
  (f*g)[n]	
  denotes	
  the	
  n-­‐th	
  
element	
  of	
  f*g.	
  	
  
y = x ∗w = yn[ ], yn = x ∗w( ) n[ ]= xn+i−1wi
i=1
w
∑ = wT
xn:n+ w −1
∂yn−i+1
∂xn
= wi,
∂yn
∂wi
= xn+i−1 for 1≤ i ≤ w
xn	
w1	
 yn	
w2	
…	
yn-­‐1	
…	
From	
  a	
  fixed	
  xn	
  stand	
  point,	
  	
  
xn	
  has	
  outgoing	
  connec.ons	
  	
  
to	
  yn-­‐|W|+1:n,	
  i.e.,	
  	
  
all	
  yn-­‐|W|+1:n	
  have	
  deriva.ves	
  	
  
w.r.t.	
  xn.	
  Note	
  that	
  y	
  and	
  w	
  
indices	
  are	
  reverse	
  order.	
  	
x	
 Convolu.on	
|w|	
xn	
…	
w1	
w2	
…	
yn	
yn	
  has	
  incoming	
  	
  
connec.ons	
  from	
  xn:n+|W|-­‐1.	
  	
  
x	
 Convolu.on	
|w|	
 xn+1
10	
  /14	
  
Backpropaga4on	
  in	
  Convolu4on	
  Layer	
Error	
  signals	
  and	
  gradient	
  for	
  each	
  example	
  are	
  computed	
  by	
  convolu.on	
  using	
  	
  
the	
  commuta.vity	
  property	
  of	
  convolu.on	
  and	
  the	
  mul.variable	
  chain	
  rule	
  of	
  deriva.ve.	
  	
  
Let’s	
  focus	
  on	
  single	
  elements	
  of	
  error	
  signals	
  and	
  a	
  gradient	
  w.r.t.	
  w.	
  	
  
	
  
δn
x( )
=
∂J
∂xn
=
∂J
∂y
∂y
∂xn
=
∂J
∂yn−i+1
∂yn−i+1
∂xni=1
w
∑ = δn−i+1
y( )
wi
i=1
w
∑ = δ
y( )
∗flip w( )( ) n[ ], δ
x( )
= δn
x( )
⎡
⎣
⎤
⎦
= δ
y( )
∗flip w( )
∂J
∂wi
=
∂J
∂y
∂y
∂wi
=
∂J
∂yn
∂yn
∂win=1
x − w +1
∑ = δn
y( )
xn+i−1
n=1
x − w +1
∑ = δ
y( )
∗ x( ) i[ ],
∂J
∂w
=
∂J
∂wi
⎡
⎣
⎢
⎤
⎦
⎥ = δ
y( )
∗ x = x ∗δ
y( )
↑	
  Reverse	
  order	
  linear	
  combina.on	
x	
  	
  	
  	
  	
  *	
  	
  	
  	
  	
  w	
  	
  	
  	
  	
  =	
  	
  	
  	
  	
  y	
xn	
…	
W1	
W2	
…	
yn	
Forward	
  propaga.on	
  (convolu.on)	
(Valid	
  convolu.on)	
|w|	
 xn+1	
Backward	
  propaga.on	
…	
…	
δ(y)
n	
w1	
w2	
δ(x)	
  	
  =	
  	
  	
  flip(w)	
  	
  *	
  	
  	
  δ(y)
	
δ(x)
n	
(Full	
  convolu.on)	
|w|	
δ(y)
n-­‐1	
…	
∂J/∂wi	
xn	
δ(y)
1	
δ(y)
2	
x	
  	
  	
  	
  *	
  	
  	
  	
  δ(y)	
  	
  	
  =	
  	
  	
  ∂J/∂W	
(Valid	
  convolu.on)	
Gradient	
  computa.on	
|y|	
xn+1	
…
11	
  /14	
  
Deriva4ves	
  of	
  Pooling	
Pooling	
  layer	
  subsamples	
  sta.s.cs	
  to	
  obtain	
  summary	
  sta.s.cs	
  with	
  any	
  aggregate	
  
func.on	
  (or	
  filter)	
  g	
  whose	
  input	
  is	
  vector,	
  and	
  output	
  is	
  scalar.	
  Subsampling	
  is	
  an	
  
opera.on	
  like	
  convolu.on,	
  however	
  g	
  is	
  applied	
  to	
  disjoint	
  (non-­‐overlapping)	
  regions.	
  
	
  
n  Defini.on:	
  subsample	
  (or	
  downsample)	
  
Let	
  m	
  be	
  the	
  size	
  of	
  pooling	
  region,	
  x	
  be	
  the	
  input,	
  and	
  y	
  be	
  the	
  output	
  of	
  the	
  pooling	
  layer.	
  	
  
subsample(f,	
  g)[n]	
  denotes	
  the	
  n-­‐th	
  element	
  of	
  subsample(f,	
  g).	
  	
  
yn = subsample x,g( ) n[ ]= g x n−1( )m+1:nm( )
y = subsample x,g( ) = yn[ ]
g x( ) =
xk
k=1
m
∑
m
,
∂g
∂x
=
1
m
mean pooling
max x( ),
∂g
∂xi
=
1 if xi = max x( )
0 otherwise
⎧
⎨
⎩
max pooling
x p
= xk
p
k=1
m
∑
⎛
⎝⎜
⎞
⎠⎟
1/p
,
∂g
∂xi
= xk
p
k=1
m
∑
⎛
⎝⎜
⎞
⎠⎟
1/p−1
xi
p−1
Lp
pooling
or any other differentiable Rm
→ R functions
⎧
⎨
⎪
⎪
⎪
⎪
⎪
⎩
⎪
⎪
⎪
⎪
⎪
x	
 Pooling	
yn	
g	
m	
x(n-­‐1)m+1	
…
12	
  /14	
  
Backpropaga4on	
  in	
  Pooling	
  Layer	
Error	
  signals	
  for	
  each	
  example	
  are	
  computed	
  by	
  upsampling.	
  Upsampling	
  is	
  an	
  opera.on	
  
which	
  backpropagates	
  (distributes)	
  the	
  error	
  signals	
  over	
  the	
  aggregate	
  func.on	
  g	
  using	
  
its	
  deriva.ves	
  g’n	
  =	
  ∂g/∂x(n-­‐1)m+1:nm.	
  g’n	
  can	
  change	
  depending	
  on	
  pooling	
  region	
  n.	
  	
  
p  In	
  max	
  pooling,	
  the	
  unit	
  which	
  was	
  the	
  max	
  at	
  forward	
  propaga.on	
  receives	
  all	
  the	
  error	
  at	
  backward	
  
propaga.on	
  and	
  the	
  unit	
  is	
  different	
  depending	
  on	
  the	
  region	
  n.	
  	
  
	
  
n  Defini.on:	
  upsample	
  
upsample(f,	
  g)[n]	
  denotes	
  the	
  n-­‐th	
  element	
  of	
  upsample(f,	
  g).	
  	
  
δ n−1( )m+1:nm
x( )
= upsample δ
y( )
, ′g( ) n[ ]= δn
y( )
′gn = δn
y( ) ∂g
∂x n−1( )m+1:nm
=
∂J
∂yn
∂yn
∂x n−1( )m+1:nm
=
∂J
∂x n−1( )m+1:nm
δ
x( )
= upsample δ
a( )
, ′g( )= δ n−1( )m+1:nm
x( )
⎡
⎣
⎤
⎦
subsample(x,	
  g)	
  =	
  y	
yn	
Forward	
  propaga.on	
  (subsapmling)	
g	
x(n-­‐1)m+1	
…	
m	
δ(x)	
  =	
  upsample(δ(y),	
  g’)	
δ(y)
n	
δ(x)
(n-­‐1)m+1	
…	
Backward	
  propaga.on	
  (upsapmling)	
∂g/∂x	
m
13	
  /14	
  
Backpropaga4on	
  in	
  CNN	
  (Summary)	
Plug	
  in	
  
δ(conv)
	
Plug	
  in	
  
δ(conv)
	
…	
∂J/∂Wn	
xn	
xn+1	
…	
(Valid	
  convolu.on)	
δ(conv)
1	
δ(conv)
2	
x ∗δ
conv( )
= ∇wJ
3.	
  Compute	
  gradient	
  ∇wJ	
…	
…	
δ(conv)
n-­‐1	
δ(conv)
n	
W1	
W2	
δ(x)
n	
(Full	
  convolu.on)	
2.	
  Propagate	
  error	
  signals	
  δ(conv)
	
δ
x( )
= δ
conv( )
∗flip w( ) δ
conv( )
= upsample δ
pool( )
, ′g( )• f sigm( )
• 1− f sigm( )
( )
1.	
  Propagate	
  error	
  signals	
  δ(pool)
	
δ(pool)
n	
δ(sigm)
(n-­‐1)m+1	
…	
δ(conv)
(n-­‐1)m+1	
…	
Deriva.ve	
  of	
  sigmoid	
  
Convolu.on	
 Convolu.on	
 Sigmoid	
 Pooling
14	
  /14	
  
Remarks	
n  References	
  
p  UFLDL	
  Tutorial,	
  h[p://ufldl.stanford.edu/tutorial	
  
p  Chain	
  Rule	
  of	
  Neural	
  Network	
  is	
  Error	
  Back	
  Propaga.on,	
  	
  
h[p://like.silk.to/studymemo/ChainRuleNeuralNetwork.pdf	
  
n  Acknowledgement	
  
This	
  memo	
  was	
  wri[en	
  thanks	
  to	
  a	
  good	
  discussion	
  with	
  Prof.	
  Masayuki	
  Tanaka.	
  	
  

More Related Content

What's hot

Convolutional Neural Network Models - Deep Learning
Convolutional Neural Network Models - Deep LearningConvolutional Neural Network Models - Deep Learning
Convolutional Neural Network Models - Deep LearningMohamed Loey
 
CNN and its applications by ketaki
CNN and its applications by ketakiCNN and its applications by ketaki
CNN and its applications by ketakiKetaki Patwari
 
Adaline madaline
Adaline madalineAdaline madaline
Adaline madalineNagarajan
 
Convolutional Neural Network - CNN | How CNN Works | Deep Learning Course | S...
Convolutional Neural Network - CNN | How CNN Works | Deep Learning Course | S...Convolutional Neural Network - CNN | How CNN Works | Deep Learning Course | S...
Convolutional Neural Network - CNN | How CNN Works | Deep Learning Course | S...Simplilearn
 
Introduction to PyTorch
Introduction to PyTorchIntroduction to PyTorch
Introduction to PyTorchJun Young Park
 
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...Simplilearn
 
Introduction to Recurrent Neural Network
Introduction to Recurrent Neural NetworkIntroduction to Recurrent Neural Network
Introduction to Recurrent Neural NetworkYan Xu
 
Brief Introduction to Deep Learning + Solving XOR using ANNs
Brief Introduction to Deep Learning + Solving XOR using ANNsBrief Introduction to Deep Learning + Solving XOR using ANNs
Brief Introduction to Deep Learning + Solving XOR using ANNsAhmed Gad
 
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...Simplilearn
 
Introduction to Artificial Neural Network
Introduction to Artificial Neural Network Introduction to Artificial Neural Network
Introduction to Artificial Neural Network Qingkai Kong
 
Artificial Neural Networks Lect3: Neural Network Learning rules
Artificial Neural Networks Lect3: Neural Network Learning rulesArtificial Neural Networks Lect3: Neural Network Learning rules
Artificial Neural Networks Lect3: Neural Network Learning rulesMohammed Bennamoun
 
Introduction to TensorFlow 2.0
Introduction to TensorFlow 2.0Introduction to TensorFlow 2.0
Introduction to TensorFlow 2.0Databricks
 
Loss Functions for Deep Learning - Javier Ruiz Hidalgo - UPC Barcelona 2018
Loss Functions for Deep Learning - Javier Ruiz Hidalgo - UPC Barcelona 2018Loss Functions for Deep Learning - Javier Ruiz Hidalgo - UPC Barcelona 2018
Loss Functions for Deep Learning - Javier Ruiz Hidalgo - UPC Barcelona 2018Universitat Politècnica de Catalunya
 
Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering
Convolutional Neural Networks on Graphs with Fast Localized Spectral FilteringConvolutional Neural Networks on Graphs with Fast Localized Spectral Filtering
Convolutional Neural Networks on Graphs with Fast Localized Spectral FilteringSOYEON KIM
 
Introduction to Deep Learning, Keras, and TensorFlow
Introduction to Deep Learning, Keras, and TensorFlowIntroduction to Deep Learning, Keras, and TensorFlow
Introduction to Deep Learning, Keras, and TensorFlowSri Ambati
 
Understanding Convolutional Neural Networks
Understanding Convolutional Neural NetworksUnderstanding Convolutional Neural Networks
Understanding Convolutional Neural NetworksJeremy Nixon
 
GAN - Theory and Applications
GAN - Theory and ApplicationsGAN - Theory and Applications
GAN - Theory and ApplicationsEmanuele Ghelfi
 

What's hot (20)

Convolutional Neural Network Models - Deep Learning
Convolutional Neural Network Models - Deep LearningConvolutional Neural Network Models - Deep Learning
Convolutional Neural Network Models - Deep Learning
 
CNN and its applications by ketaki
CNN and its applications by ketakiCNN and its applications by ketaki
CNN and its applications by ketaki
 
Adaline madaline
Adaline madalineAdaline madaline
Adaline madaline
 
Convolutional Neural Network - CNN | How CNN Works | Deep Learning Course | S...
Convolutional Neural Network - CNN | How CNN Works | Deep Learning Course | S...Convolutional Neural Network - CNN | How CNN Works | Deep Learning Course | S...
Convolutional Neural Network - CNN | How CNN Works | Deep Learning Course | S...
 
Introduction to PyTorch
Introduction to PyTorchIntroduction to PyTorch
Introduction to PyTorch
 
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...
 
Introduction to Recurrent Neural Network
Introduction to Recurrent Neural NetworkIntroduction to Recurrent Neural Network
Introduction to Recurrent Neural Network
 
Brief Introduction to Deep Learning + Solving XOR using ANNs
Brief Introduction to Deep Learning + Solving XOR using ANNsBrief Introduction to Deep Learning + Solving XOR using ANNs
Brief Introduction to Deep Learning + Solving XOR using ANNs
 
AlexNet
AlexNetAlexNet
AlexNet
 
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
 
Introduction to Artificial Neural Network
Introduction to Artificial Neural Network Introduction to Artificial Neural Network
Introduction to Artificial Neural Network
 
Artificial Neural Networks Lect3: Neural Network Learning rules
Artificial Neural Networks Lect3: Neural Network Learning rulesArtificial Neural Networks Lect3: Neural Network Learning rules
Artificial Neural Networks Lect3: Neural Network Learning rules
 
Introduction to TensorFlow 2.0
Introduction to TensorFlow 2.0Introduction to TensorFlow 2.0
Introduction to TensorFlow 2.0
 
Loss Functions for Deep Learning - Javier Ruiz Hidalgo - UPC Barcelona 2018
Loss Functions for Deep Learning - Javier Ruiz Hidalgo - UPC Barcelona 2018Loss Functions for Deep Learning - Javier Ruiz Hidalgo - UPC Barcelona 2018
Loss Functions for Deep Learning - Javier Ruiz Hidalgo - UPC Barcelona 2018
 
Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering
Convolutional Neural Networks on Graphs with Fast Localized Spectral FilteringConvolutional Neural Networks on Graphs with Fast Localized Spectral Filtering
Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering
 
Recurrent Neural Network
Recurrent Neural NetworkRecurrent Neural Network
Recurrent Neural Network
 
GoogLeNet Insights
GoogLeNet InsightsGoogLeNet Insights
GoogLeNet Insights
 
Introduction to Deep Learning, Keras, and TensorFlow
Introduction to Deep Learning, Keras, and TensorFlowIntroduction to Deep Learning, Keras, and TensorFlow
Introduction to Deep Learning, Keras, and TensorFlow
 
Understanding Convolutional Neural Networks
Understanding Convolutional Neural NetworksUnderstanding Convolutional Neural Networks
Understanding Convolutional Neural Networks
 
GAN - Theory and Applications
GAN - Theory and ApplicationsGAN - Theory and Applications
GAN - Theory and Applications
 

Similar to Backpropagation in Convolutional Neural Network

Hyperfunction method for numerical integration and Fredholm integral equation...
Hyperfunction method for numerical integration and Fredholm integral equation...Hyperfunction method for numerical integration and Fredholm integral equation...
Hyperfunction method for numerical integration and Fredholm integral equation...HidenoriOgata
 
Litvinenko_RWTH_UQ_Seminar_talk.pdf
Litvinenko_RWTH_UQ_Seminar_talk.pdfLitvinenko_RWTH_UQ_Seminar_talk.pdf
Litvinenko_RWTH_UQ_Seminar_talk.pdfAlexander Litvinenko
 
Artificial neural network
Artificial neural networkArtificial neural network
Artificial neural networkIldar Nurgaliev
 
A Generalization of the Chow-Liu Algorithm and its Applications to Artificial...
A Generalization of the Chow-Liu Algorithm and its Applications to Artificial...A Generalization of the Chow-Liu Algorithm and its Applications to Artificial...
A Generalization of the Chow-Liu Algorithm and its Applications to Artificial...Joe Suzuki
 
Murphy: Machine learning A probabilistic perspective: Ch.9
Murphy: Machine learning A probabilistic perspective: Ch.9Murphy: Machine learning A probabilistic perspective: Ch.9
Murphy: Machine learning A probabilistic perspective: Ch.9Daisuke Yoneoka
 
IVR - Chapter 1 - Introduction
IVR - Chapter 1 - IntroductionIVR - Chapter 1 - Introduction
IVR - Chapter 1 - IntroductionCharles Deledalle
 
GradStudentSeminarSept30
GradStudentSeminarSept30GradStudentSeminarSept30
GradStudentSeminarSept30Ryan White
 
Numerical integration based on the hyperfunction theory
Numerical integration based on the hyperfunction theoryNumerical integration based on the hyperfunction theory
Numerical integration based on the hyperfunction theoryHidenoriOgata
 
Meta-learning and the ELBO
Meta-learning and the ELBOMeta-learning and the ELBO
Meta-learning and the ELBOYoonho Lee
 
1531 fourier series- integrals and trans
1531 fourier series- integrals and trans1531 fourier series- integrals and trans
1531 fourier series- integrals and transDr Fereidoun Dejahang
 
1 hofstad
1 hofstad1 hofstad
1 hofstadYandex
 

Similar to Backpropagation in Convolutional Neural Network (20)

Matrix calculus
Matrix calculusMatrix calculus
Matrix calculus
 
Hyperfunction method for numerical integration and Fredholm integral equation...
Hyperfunction method for numerical integration and Fredholm integral equation...Hyperfunction method for numerical integration and Fredholm integral equation...
Hyperfunction method for numerical integration and Fredholm integral equation...
 
Litvinenko_RWTH_UQ_Seminar_talk.pdf
Litvinenko_RWTH_UQ_Seminar_talk.pdfLitvinenko_RWTH_UQ_Seminar_talk.pdf
Litvinenko_RWTH_UQ_Seminar_talk.pdf
 
Artificial neural networks
Artificial neural networks Artificial neural networks
Artificial neural networks
 
Artificial neural network
Artificial neural networkArtificial neural network
Artificial neural network
 
A Generalization of the Chow-Liu Algorithm and its Applications to Artificial...
A Generalization of the Chow-Liu Algorithm and its Applications to Artificial...A Generalization of the Chow-Liu Algorithm and its Applications to Artificial...
A Generalization of the Chow-Liu Algorithm and its Applications to Artificial...
 
Murphy: Machine learning A probabilistic perspective: Ch.9
Murphy: Machine learning A probabilistic perspective: Ch.9Murphy: Machine learning A probabilistic perspective: Ch.9
Murphy: Machine learning A probabilistic perspective: Ch.9
 
Derivatives
DerivativesDerivatives
Derivatives
 
IVR - Chapter 1 - Introduction
IVR - Chapter 1 - IntroductionIVR - Chapter 1 - Introduction
IVR - Chapter 1 - Introduction
 
MSR
MSRMSR
MSR
 
3. Functions II.pdf
3. Functions II.pdf3. Functions II.pdf
3. Functions II.pdf
 
Ece3075 a 8
Ece3075 a 8Ece3075 a 8
Ece3075 a 8
 
GradStudentSeminarSept30
GradStudentSeminarSept30GradStudentSeminarSept30
GradStudentSeminarSept30
 
Numerical integration based on the hyperfunction theory
Numerical integration based on the hyperfunction theoryNumerical integration based on the hyperfunction theory
Numerical integration based on the hyperfunction theory
 
Meta-learning and the ELBO
Meta-learning and the ELBOMeta-learning and the ELBO
Meta-learning and the ELBO
 
Colloquium
ColloquiumColloquium
Colloquium
 
1531 fourier series- integrals and trans
1531 fourier series- integrals and trans1531 fourier series- integrals and trans
1531 fourier series- integrals and trans
 
1 hofstad
1 hofstad1 hofstad
1 hofstad
 
exponen dan logaritma
exponen dan logaritmaexponen dan logaritma
exponen dan logaritma
 
5.n nmodels i
5.n nmodels i5.n nmodels i
5.n nmodels i
 

Recently uploaded

Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 

Recently uploaded (20)

Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 

Backpropagation in Convolutional Neural Network

  • 1. Memo:  Backpropaga.on     in  Convolu.onal  Neural  Network Hiroshi  Kuwajima   13-­‐03-­‐2014  Created   14-­‐08-­‐2014  Revised   1 /14
  • 2. 2  /14   Note n  Purpose   The  purpose  of  this  memo  is  trying  to  understand  and  remind  the  backpropaga.on  algorithm  in   Convolu.onal  Neural  Network  based  on  a  discussion  with  Prof.  Masayuki  Tanaka.     n  Table  of  Contents   In  this  memo,  backpropaga.on  algorithms  in  different  neural  networks  are  explained  in  the  following   order.       p  Single  neuron    3   p  Mul.-­‐layer  neural  network  5   p  General  cases    7   p  Convolu.on  layer    9   p  Pooling  layer      11   p  Convolu.onal  Neural  Network  13   n  Nota.on   This  memo  follows  the  nota.on  in  UFLDL  tutorial  (h[p://ufldl.stanford.edu/tutorial)  
  • 3. 3  /14   Neural  Network  as  a  Composite  Func4on A  neural  network  is  decomposed  into  a  composite  func.on  where  each  func.on  element   corresponds  to  a  differen.able  opera.on.       n  Single  neuron  (the  simplest  neural  network)  example   A  single  neuron  is  decomposed  into  a  composite  func.on  of  an  affine  func.on  element  parameterized  by   W  and  b  and  an  ac.va.on  func.on  element    f  which  we  choose  to  be  the  sigmoid  func.on.                 Deriva.ves  of  both  affine  and  sigmoid  func.on  elements  w.r.t.  both  inputs  and  parameters  are  known.   Note  that  sigmoid  func.on  does  not  have  neither  parameters  nor  deriva.ves  parameters.     Sigmoid  func.on  is  applied  element-­‐wise.  ‘●’  denotes  Hadamard  product,  or  element-­‐wise  product.     hW,b x( ) = f WT x + b( )= sigmoid affineW,b x( )( )= sigmoid!affineW,b( ) x( ) ∂a ∂z = a • 1− a( ) where a = hw,b x( ) = sigmoid z( ) = 1 1+ exp −z( ) ∂z ∂x = W, ∂z ∂W = x, ∂z ∂b = I where z = affineW,b x( ) = WT x + b, and I is identity matrix Decomposi.on Neuron Standard  network  representa.on x1 x2 x3 +1 hW,b(x) Affine   Ac.va.on   (e.g.  sigmoid)   Composite  func.on  representa.on x1 x2 x3 +1 hW,b(x) z a
  • 4. ∇W J W,b;x, y( ) = ∂ ∂W J W,b;x, y( ) = ∂J ∂z ∂z ∂W = δ z( ) xT ∇bJ W,b;x, y( ) = ∂ ∂b J W,b;x, y( ) = ∂J ∂z ∂z ∂b = δ z( ) 4  /14   Chain  Rule  of  Error  Signals  and  Gradients Error  signals  are  defined  as  the  deriva.ves  of  any  cost  func.on  J  which  we  choose  to  be   the  square  error.  Error  signals  are  computed  (propagated  backward)  by  the  chain  rule  of   deriva.ve  and  useful  for  compu.ng  the  gradient  of  the  cost  func.on.       n  Single  neuron  example   Suppose  we  have  m  labeled  training  examples  {(x(1),  y(1)),  …,  (x(m),  y(m))}.  Square  error  cost  func.on  for  each   example  is  as  follows.  Overall  cost  func.on  is  the  summa.on  of  cost  func.ons  over  all  examples.         Error  signals  of  the  square  error  cost  func.on  for  each  example  are  propagated  using  deriva.ves  of   func.on  elements  w.r.t.  inputs.         Gradient  of  the  cost  func.on  w.r.t.  parameters  for  each  example  is  computed  using  error  signals  and   deriva.ves  of  func.on  elements  w.r.t.  parameters.  Summing  gradients  for  all  examples  gets  overall   gradient.     δ a( ) = ∂ ∂a J W,b;x, y( ) = − y − a( ) δ z( ) = ∂ ∂z J W,b;x, y( ) = ∂J ∂a ∂a ∂z = δ a( ) • a • 1− a( ) J W,b;x, y( ) = 1 2 y − hw,b x( ) 2
  • 5. 5  /14   Decomposi4on  of  Mul4-­‐Layer  Neural  Network n  Composite  func.on  representa.on  of  a  mul.-­‐layer  neural  network     n  Deriva.ves  of  func.on  elements  w.r.t.  inputs  and  parameters   a 1( ) = x, a lmax( ) = hw,b x( ) ∂a l+1( ) ∂z l+1( ) = a l+1( ) • 1− a l+1( ) ( ) where a l+1( ) = sigmoid z l+1( ) ( )= 1 1+ exp −z l+1( ) ( ) ∂z l+1( ) ∂a l( ) = W l( ) , ∂z l+1( ) ∂W l( ) = a l( ) , ∂z l+1( ) ∂b l( ) = I where z l+1( ) = W l( ) ( ) T a l( ) + b l( ) hW,b x( ) = sigmoid!affineW 2( ),b 2( ) !sigmoid!affineW 1( ),b 1( )( ) x( ) Decomposi.on Standard  network  representa.on x1 x2 x3 +1 Layer  1 +1 Layer  2 x hW,b(x) a2 (2) a1 (2) a3 (2) Composite  func.on  representa.on x1 x2 x3 +1 Affine  1 Sigmoid  1 x z2 (2) z1 (2) z3 (2) +1 Affine  2 a2 (2) a1 (2) a3 (2) hW,b(x) z1 (3) a1 (3) a1 (1) a2 (1) a3 (1) Sigmoid  2
  • 6. 6  /14   Error  Signals  and  Gradients  in  Mul4-­‐Layer  NN n  Error  signals  of  the  square  error  cost  func.on  for  each  example   n  Gradient  of  the  cost  func.on  w.r.t.  parameters  for  each  example   δ a l( ) ( ) = ∂ ∂a l( ) J W,b;x, y( ) = − y − a l( ) ( ) for l = lmax ∂J ∂z l+1( ) ∂z l+1( ) ∂a l( ) = W l( ) ( ) T δ z l+1( ) ( ) otherwise ⎧ ⎨ ⎪ ⎩ ⎪ δ z l( ) ( ) = ∂ ∂z l( ) J W,b;x, y( ) = ∂J ∂a l( ) ∂a l( ) ∂z l( ) = δ a l( ) ( ) • a l( ) • 1− a l( ) ( ) ∇W l( ) J W,b;x, y( ) = ∂ ∂W l( ) J W,b;x, y( ) = ∂J ∂z l+1( ) ∂z l+1( ) ∂W l( ) = δ z l+1( ) ( ) a l( ) ( ) T ∇b l( ) J W,b;x, y( ) = ∂ ∂b l( ) J W,b;x, y( ) = ∂J ∂z l+1( ) ∂z l+1( ) ∂b l( ) = δ z l+1( ) ( )
  • 7. 7  /14   Backpropaga4on  in  General  Cases 1.  Decompose  opera.ons  in  layers  of  a  neural  network  into  func.on  elements  whose   deriva.ves  w.r.t  inputs  are  known  by  symbolic  computa.on.     2.  Backpropagate  error  signals  corresponding  to  a  differen.able  cost  func.on  by   numerical  computa.on  (Star.ng  from  cost  func.on,  plug  in  error  signals  backward).     3.  Use  backpropagated  error  signals  to  compute  gradients  w.r.t.  parameters  only  for  the   func.on  elements  with  parameters  where  their  deriva.ves  w.r.t  parameters  are   known  by  symbolic  computa.on.     4.  Sum  gradients  over  all  example  to  get  overall  gradient.     hθ x( ) = f lmax( ) !…! fθ l( ) l( ) !…! fθ 2( ) 2( ) ! f 1( ) ( ) x( ) where f 1( ) = x, f lmax( ) = hθ x( ) and ∀l : ∂f l+1( ) ∂f l( ) is known δ l( ) = ∂ ∂f l( ) J θ;x, y( ) = ∂J ∂f l+1( ) ∂f l+1( ) ∂f l( ) = δ l+1( ) ∂f l+1( ) ∂f l( ) where ∂J ∂f lmax( ) is known ∇θ l( ) J θ;x, y( ) = ∂ ∂θ l( ) J θ;x, y( ) = ∂J ∂f l( ) ∂fθ l( ) l( ) ∂θ l( ) = δ l( ) ∂fθ l( ) l( ) ∂θ l( ) where ∂fθ l( ) l( ) ∂θ l( ) is known ∇θ l( ) J θ( ) = ∇θ l( ) J θ;x i( ) , y i( ) ( )i=1 m ∑
  • 8. 8  /14   Convolu4onal  Neural  Network A  convolu.on-­‐pooling  layer  in  Convolu.onal  Neural  Network  is  a  composite  func.on   decomposed  into  func.on  elements  f(conv),  f(sigm),  and  f(pool).     Let  x  be  the  output  from  the  previous  layer.  Sigmoid  nonlinearity  is  op.onal.     f pool( ) ! f sigm( ) ! fw conv( ) ( ) x( ) Convolu.on Sigmoid x Pooling Forward  propaga.on Backward  propaga.on Convolu.on Sigmoid x Pooling
  • 9. 9  /14   Deriva4ves  of  Convolu4on n  Discrete  convolu.on  parameterized  by  a  feature  w  and  its  deriva.ves   Let  x  be  the  input,  and  y  be  the  output  of  convolu.on  layer.  Here  we  focus  on  only  one  feature  vector  w,   although  a  convolu.on  layer  usually  has  mul.ple  features  W  =  [w1  w2  …  wn].  n  indexes  x  and  y  where     1  ≤  n  ≤  |x|  for  xn,  1  ≤  n  ≤  |y|  =  |x|  -­‐  |w|  +  1  for  yn.  i  indexes  w  where  1  ≤  i  ≤  |w|.  (f*g)[n]  denotes  the  n-­‐th   element  of  f*g.     y = x ∗w = yn[ ], yn = x ∗w( ) n[ ]= xn+i−1wi i=1 w ∑ = wT xn:n+ w −1 ∂yn−i+1 ∂xn = wi, ∂yn ∂wi = xn+i−1 for 1≤ i ≤ w xn w1 yn w2 … yn-­‐1 … From  a  fixed  xn  stand  point,     xn  has  outgoing  connec.ons     to  yn-­‐|W|+1:n,  i.e.,     all  yn-­‐|W|+1:n  have  deriva.ves     w.r.t.  xn.  Note  that  y  and  w   indices  are  reverse  order.   x Convolu.on |w| xn … w1 w2 … yn yn  has  incoming     connec.ons  from  xn:n+|W|-­‐1.     x Convolu.on |w| xn+1
  • 10. 10  /14   Backpropaga4on  in  Convolu4on  Layer Error  signals  and  gradient  for  each  example  are  computed  by  convolu.on  using     the  commuta.vity  property  of  convolu.on  and  the  mul.variable  chain  rule  of  deriva.ve.     Let’s  focus  on  single  elements  of  error  signals  and  a  gradient  w.r.t.  w.       δn x( ) = ∂J ∂xn = ∂J ∂y ∂y ∂xn = ∂J ∂yn−i+1 ∂yn−i+1 ∂xni=1 w ∑ = δn−i+1 y( ) wi i=1 w ∑ = δ y( ) ∗flip w( )( ) n[ ], δ x( ) = δn x( ) ⎡ ⎣ ⎤ ⎦ = δ y( ) ∗flip w( ) ∂J ∂wi = ∂J ∂y ∂y ∂wi = ∂J ∂yn ∂yn ∂win=1 x − w +1 ∑ = δn y( ) xn+i−1 n=1 x − w +1 ∑ = δ y( ) ∗ x( ) i[ ], ∂J ∂w = ∂J ∂wi ⎡ ⎣ ⎢ ⎤ ⎦ ⎥ = δ y( ) ∗ x = x ∗δ y( ) ↑  Reverse  order  linear  combina.on x          *          w          =          y xn … W1 W2 … yn Forward  propaga.on  (convolu.on) (Valid  convolu.on) |w| xn+1 Backward  propaga.on … … δ(y) n w1 w2 δ(x)    =      flip(w)    *      δ(y) δ(x) n (Full  convolu.on) |w| δ(y) n-­‐1 … ∂J/∂wi xn δ(y) 1 δ(y) 2 x        *        δ(y)      =      ∂J/∂W (Valid  convolu.on) Gradient  computa.on |y| xn+1 …
  • 11. 11  /14   Deriva4ves  of  Pooling Pooling  layer  subsamples  sta.s.cs  to  obtain  summary  sta.s.cs  with  any  aggregate   func.on  (or  filter)  g  whose  input  is  vector,  and  output  is  scalar.  Subsampling  is  an   opera.on  like  convolu.on,  however  g  is  applied  to  disjoint  (non-­‐overlapping)  regions.     n  Defini.on:  subsample  (or  downsample)   Let  m  be  the  size  of  pooling  region,  x  be  the  input,  and  y  be  the  output  of  the  pooling  layer.     subsample(f,  g)[n]  denotes  the  n-­‐th  element  of  subsample(f,  g).     yn = subsample x,g( ) n[ ]= g x n−1( )m+1:nm( ) y = subsample x,g( ) = yn[ ] g x( ) = xk k=1 m ∑ m , ∂g ∂x = 1 m mean pooling max x( ), ∂g ∂xi = 1 if xi = max x( ) 0 otherwise ⎧ ⎨ ⎩ max pooling x p = xk p k=1 m ∑ ⎛ ⎝⎜ ⎞ ⎠⎟ 1/p , ∂g ∂xi = xk p k=1 m ∑ ⎛ ⎝⎜ ⎞ ⎠⎟ 1/p−1 xi p−1 Lp pooling or any other differentiable Rm → R functions ⎧ ⎨ ⎪ ⎪ ⎪ ⎪ ⎪ ⎩ ⎪ ⎪ ⎪ ⎪ ⎪ x Pooling yn g m x(n-­‐1)m+1 …
  • 12. 12  /14   Backpropaga4on  in  Pooling  Layer Error  signals  for  each  example  are  computed  by  upsampling.  Upsampling  is  an  opera.on   which  backpropagates  (distributes)  the  error  signals  over  the  aggregate  func.on  g  using   its  deriva.ves  g’n  =  ∂g/∂x(n-­‐1)m+1:nm.  g’n  can  change  depending  on  pooling  region  n.     p  In  max  pooling,  the  unit  which  was  the  max  at  forward  propaga.on  receives  all  the  error  at  backward   propaga.on  and  the  unit  is  different  depending  on  the  region  n.       n  Defini.on:  upsample   upsample(f,  g)[n]  denotes  the  n-­‐th  element  of  upsample(f,  g).     δ n−1( )m+1:nm x( ) = upsample δ y( ) , ′g( ) n[ ]= δn y( ) ′gn = δn y( ) ∂g ∂x n−1( )m+1:nm = ∂J ∂yn ∂yn ∂x n−1( )m+1:nm = ∂J ∂x n−1( )m+1:nm δ x( ) = upsample δ a( ) , ′g( )= δ n−1( )m+1:nm x( ) ⎡ ⎣ ⎤ ⎦ subsample(x,  g)  =  y yn Forward  propaga.on  (subsapmling) g x(n-­‐1)m+1 … m δ(x)  =  upsample(δ(y),  g’) δ(y) n δ(x) (n-­‐1)m+1 … Backward  propaga.on  (upsapmling) ∂g/∂x m
  • 13. 13  /14   Backpropaga4on  in  CNN  (Summary) Plug  in   δ(conv) Plug  in   δ(conv) … ∂J/∂Wn xn xn+1 … (Valid  convolu.on) δ(conv) 1 δ(conv) 2 x ∗δ conv( ) = ∇wJ 3.  Compute  gradient  ∇wJ … … δ(conv) n-­‐1 δ(conv) n W1 W2 δ(x) n (Full  convolu.on) 2.  Propagate  error  signals  δ(conv) δ x( ) = δ conv( ) ∗flip w( ) δ conv( ) = upsample δ pool( ) , ′g( )• f sigm( ) • 1− f sigm( ) ( ) 1.  Propagate  error  signals  δ(pool) δ(pool) n δ(sigm) (n-­‐1)m+1 … δ(conv) (n-­‐1)m+1 … Deriva.ve  of  sigmoid   Convolu.on Convolu.on Sigmoid Pooling
  • 14. 14  /14   Remarks n  References   p  UFLDL  Tutorial,  h[p://ufldl.stanford.edu/tutorial   p  Chain  Rule  of  Neural  Network  is  Error  Back  Propaga.on,     h[p://like.silk.to/studymemo/ChainRuleNeuralNetwork.pdf   n  Acknowledgement   This  memo  was  wri[en  thanks  to  a  good  discussion  with  Prof.  Masayuki  Tanaka.