Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Example feedforward backpropagation

254 views

Published on

A little example of feedforward and backpropagation in a Convolutional Neural Network for classification.

Published in: Software
  • Be the first to comment

Example feedforward backpropagation

  1. 1. A Little Example of Feedforward and Backpropagation in CNN • Edwin Efraín Jiménez Lepe
  2. 2. 16 24 32 47 18 26 68 12 9 0 1 -1 0 2 3 4 5 Input W1 1 0 b1 Convolution (kernel rotated plus bias) 24 -13 51 -13 ReLU 24 0 51 0 353 354 535 248 353 354 535 248
  3. 3. ReLU 24 0 51 0 353 354 535 248 Max pooling(2,2) 51 535 Reshape 51 535 0.002 0.03 0.05 0.07 0.018 0.016 0.004 0.006 0.0062 0.009 wh1bh1 0 0 0 0 0 (X_h1*wh1)+bh1 X_h1 8.662 3.67 5.76 6.887 5.733X_h1_s 0.99982699 0.97515646 0.99685879 0.99898007 0.9967731X_h2 Sigmoid (X_h1_s)
  4. 4. 0.99982699 0.97515646 0.99685879 0.99898007 0.9967731X_h2 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0.1 w_ob_o 0 0 2.48734086 2.08700462 (X_h2*w_o)+b_o output 0.59876844 0.40123156 softmax Assume that Y=1
  5. 5. 0.99982699 0.97515646 0.99685879 0.99898007 0.9967731X_h2 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0.1 w_ob_o 0 0 2.48734086 2.08700462 (X_h2*w_o)+b_o output 0.59876844 0.40123156 softmax Assume that Y=1 𝑑(𝑥) =delta=𝑎 𝐿 − 𝑌 0 1 0.59876844 -0.59876844
  6. 6. 0.99982699 0.97515646 0.99685879 0.99898007 0.9967731X_h2 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0.1 w_ob_o 0 0 2.48734086 2.08700462 (X_h2*w_o)+b_o output 0.59876844 0.40123156 softmax Assume that Y=1 𝑑(𝑥) =delta=𝑎 𝐿 − 𝑌 0 1 0.59876844 -0.59876844
  7. 7. 0.99982699 0.97515646 0.99685879 0.99898007 0.9967731X_h2 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0.1 w_o b_o 0 0 2.48734086 2.08700462 (X_h2*w_o)+b_o output 0.59876844 0.40123156 softmax Assume that Y=1 𝑑(𝑥)=delta=𝑎 𝐿 − 𝑌 0 1 0.59876844 -0.59876844 𝑑(𝑥)=𝑑(𝑥+1) ∗ 𝑤_𝑜. 𝑇 -0.05987684 -0.05987684 -0.05987684 -0.05987684 0.47901476
  8. 8. 0.99982699 0.97515646 0.99685879 0.99898007 0.9967731X_h2 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0.1 w_o b_o 0 0 2.48734086 2.08700462 (X_h2*w_o)+b_o output 0.59876844 0.40123156 softmax Assume that Y=1 𝑑(𝑥) =delta=𝑎 𝐿 − 𝑌 0 1 0.59876844 -0.59876844 𝑑(𝑥)=𝑑(𝑥+1) ∗ 𝑤_𝑜. 𝑇 -0.05987684 -0.05987684 -0.05987684 -0.05987684 0.47901476 𝑑𝑤_𝑜=X_h2. T ∗ 𝑑(𝑥+1) 0.59866485 -0.59866485 0.58389291 -0.58389291 0.59688758 -0.59688758 0.59815774 -0.59815774 0.59683628 -0.59683628 𝑑𝑏_𝑜=𝑑(𝑥+1) 0.59876844 -0.59876844
  9. 9. 51 535 0.002 0.03 0.05 0.07 0.018 0.016 0.004 0.006 0.0062 0.009 wh1bh1 0 0 0 0 0 (X_h1*wh1)+bh1 X_h1 8.662 3.67 5.76 6.887 5.733X_h1_s 0.99982699 0.97515646 0.99685879 0.99898007 0.9967731X_h2 Sigmoid (X_h1_s) 𝑑(𝑥) -0.05987684 -0.05987684 -0.05987684 -0.05987684 0.47901476 𝑑(𝑥) = 𝑑(𝑥+1) ∙ 𝑠𝑖𝑔𝑚𝑜𝑖𝑑(𝑋_ℎ1_𝑠) ∙ (1 − 𝑠𝑖𝑔𝑚𝑜𝑖𝑑 𝑋_ℎ1_𝑠 ) -1.03573778e-05 -1.45059694e-03 -1.87495121e-04 -6.10079532e-05 1.54074668e-03
  10. 10. 51 535 0.002 0.03 0.05 0.07 0.018 0.016 0.004 0.006 0.0062 0.009 wh1bh1 0 0 0 0 0 (X_h1*wh1)+bh1 X_h1 8.662 3.67 5.76 6.887 5.733X_h1_s 0.99982699 0.97515646 0.99685879 0.99898007 0.9967731X_h2 Sigmoid (X_h1_s) 𝑑(𝑥) -0.05987684 -0.05987684 -0.05987684 -0.05987684 0.47901476 𝑑(𝑥) = 𝑑(𝑥+1) ∙ 𝑠𝑖𝑔𝑚𝑜𝑖𝑑(𝑋_ℎ1_𝑠) ∙ (1 − 𝑠𝑖𝑔𝑚𝑜𝑖𝑑 𝑋_ℎ1_𝑠 ) -1.03573778e-05 -1.45059694e-03 -1.87495121e-04 -6.10079532e-05 1.54074668e-03 𝑑𝑏ℎ1 = 𝑑(𝑥+1) -1.03573778e-05 -1.45059694e-03 -1.87495121e-04 -6.10079532e-05 1.54074668e-03
  11. 11. 51 535 0.002 0.03 0.05 0.07 0.018 0.016 0.004 0.006 0.0062 0.009 wh1 bh1 0 0 0 0 0 (X_h1*wh1)+bh1 X_h1 8.662 3.67 5.76 6.887 5.733X_h1_s 0.99982699 0.97515646 0.99685879 0.99898007 0.9967731X_h2 Sigmoid (X_h1_s) 𝑑(𝑥) -0.05987684 -0.05987684 -0.05987684 -0.05987684 0.47901476 𝑑(𝑥) = 𝑑(𝑥+1) ∙ 𝑠𝑖𝑔𝑚𝑜𝑖𝑑(𝑋_ℎ1_𝑠) ∙ (1 − 𝑠𝑖𝑔𝑚𝑜𝑖𝑑 𝑋_ℎ1_𝑠 ) -1.03573778e-05 -1.45059694e-03 -1.87495121e-04 -6.10079532e-05 1.54074668e-03 𝑑𝑏ℎ1 = 𝑑(𝑥+1) -1.03573778e-05 -1.45059694e-03 -1.87495121e-04 -6.10079532e-05 1.54074668e-03 𝑑𝑤ℎ1 = 𝑋_ℎ1. 𝑇 ∗ 𝑑(𝑥+1) -5.28226266e-04 -7.39804438e-02 -9.56225118e-03 -3.11140561e-03 7.85780808e-02 -5.54119710e-03 -7.76069362e-01 -1.00309890e-01 -3.26392550e-02 8.24299476e-01 𝑑(𝑥) = 𝑑(𝑥+1) ∗ 𝑤ℎ1. 𝑇 -2.94504954e-05 6.39539432e-06
  12. 12. ReLU 24 0 51 0 353 354 535 248 Max pooling(2,2) 51 535 Reshape 51 535 0.002 0.03 0.05 0.07 0.018 0.016 0.004 0.006 0.0062 0.009 wh1bh1 0 0 0 0 0 (X_h1*wh1)+bh1 X_h1 8.662 3.67 5.76 6.887 5.733X_h1_s 0.99982699 0.97515646 0.99685879 0.99898007 0.9967731X_h2 Sigmoid (X_h1_s) -2.94504954e-05 6.39539432e-06 𝑑(𝑥) = 𝑑(𝑥+1) ∗ 𝑤ℎ1. 𝑇 -2.94504954e-05 6.39539432e-06 0 0 -2.94504954e-05 0 0 0 6.39539432e-06 0 𝑑(𝑥) = 𝑢𝑝𝑠𝑎𝑚𝑝𝑙𝑖𝑛𝑔(𝑑(𝑥+1) )
  13. 13. 16 24 32 47 18 26 68 12 9 0 1 -1 0 2 3 4 5 Input W1 1 0 b1 Convolution (kernel rotated plus bias) 24 -13 51 -13 ReLU 24 0 51 0 353 354 535 248 353 354 535 248 0 0 -2.94504954e-05 0 0 0 6.39539432e-06 0 𝑑(𝑥) = 𝑢𝑝𝑠𝑎𝑚𝑝𝑙𝑖𝑛𝑔(𝑑(𝑥+1) ) 0 0 -2.94504954e-05 0 0 0 6.39539432e-06 0 𝛿𝐿 𝛿𝑦𝑙−1 = 0, 𝑖𝑓 (𝑦𝑙 < 0) 𝛿𝐿 𝛿𝑦𝑙 , 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
  14. 14. 16 24 32 47 18 26 68 12 9 0 1 -1 0 2 3 4 5 Input W1 1 0 b1 Convolution (kernel rotated plus bias) 24 -13 51 -13 353 354 535 248 0 0 -2.94504954e-05 0 0 0 6.39539432e-06 0 𝛿𝐿 𝛿𝑦𝑙−1 = 0, 𝑖𝑓 (𝑦𝑙 < 0) 𝛿𝐿 𝛿𝑦𝑙 , 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒 0 0 0 3.19769716e-05 5.50320726e-05 0 1.02643124e-05 1.27907886e-05 0
  15. 15. 16 24 32 47 18 26 68 12 9 0 1 -1 0 2 3 4 5 Input W1 1 0 b1 Convolution (kernel rotated plus bias) 24 -13 51 -13 353 354 535 248 0 0 -2.94504954e-05 0 0 0 6.39539432e-06 0 𝛿𝐿 𝛿𝑦𝑙−1 = 0, 𝑖𝑓 (𝑦𝑙 < 0) 𝛿𝐿 𝛿𝑦𝑙 , 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒 0 0 0 3.19769716e-05 5.50320726e-05 0 1.02643124e-05 1.27907886e-05 0 𝛿𝑤1 = 𝑖𝑛𝑝𝑢𝑡 ∗ 𝛿(𝑦) -0.00070681 -0.00094242 -0.00053011 -0.00076571 0.00015349 0.00020465 0.00011512 0.00016628 𝛿𝑥 = 𝛿(𝑦) ∗ 𝑟𝑜𝑡180(𝑤) (full-convolution)
  16. 16. 16 24 32 47 18 26 68 12 9 0 1 -1 0 2 3 4 5 Input W1 1 0 b1 0 0 0 3.19769716e-05 5.50320726e-05 0 1.02643124e-05 1.27907886e-05 0 𝛿𝑤1 = 𝑖𝑛𝑝𝑢𝑡 ∗ 𝛿(𝑦) -0.00070681 -0.00094242 -0.00053011 -0.00076571 0.00015349 0.00020465 0.00011512 0.00016628 𝜕𝐽 𝜕𝑏 𝑘 𝑎,𝑏 𝛿 𝑘 (𝑦) 𝑎,𝑏 = -2.94504954e-05 6.39539432e-06 𝛿𝑥 = 𝛿(𝑦) ∗ 𝑟𝑜𝑡180(𝑤) (full-convolution)

×