The Back Propagation Learning Algorithm

Loading...

Flash Player 9 (or above) is needed to view presentations.
We have detected that you do not have it on your computer. To install it, go here.

0 comments

Post a comment

    Post a comment
    Embed Video
    Edit your comment Cancel

    Favorites, Groups & Events

    The Back Propagation Learning Algorithm - Presentation Transcript

    1. The Back Propagation Learning Algorithm For networks with hidden units. Error Correcting algorithm. Solves the credit (blame) assignment problem. 1
    2. What is supervised learning? Can we teach a network to learn to associate a pattern of inputs with corresponding outputs? i.e. given initial set of weights, how can they be adapted to produce the desired output? Use a training set: y a f? d payment b e? c w p workload person workload pay P(happy) a 0.1 0.9 0.95 b 0.3 0.7 0.8 c 0.07 0.2 0.2 d 0.9 0.9 0.3 e 0.7 0.5 ?? f 0.4 0.8 ?? After training, how does network generalise to patterns unseen during learning? 2
    3. Learning by Error Correction In the perceptron there was a binary valued output Ý and a target Ø. x1 x2 xN w1 w2 wN output y target t y Æ 1 Ý step ÛÜ ¼ 0 Σwi xi i Define this error measure: ½ ´Ø   Ý µ¾ ¾ It counts the number of incorrect outputs. We want to design a weight changing procedure that minimises . 3
    4. Learning by Error Correction How do we change the weights Û¼ Û½ ÛÆ so that error decreases? E Suppose error slope slope varies with weight -ve +ve Û like this. wi If we could measure the slope Û then changing weights by the negative of the slope will minimise . slope +ve ¡Û -ve move towards minimum of slope -ve ¡Û +ve 4
    5. More Perceptron Problems For the perceptron, can’t be differentiated with respect to weights Û¼ Û½ ÛÆ because involves output Ý which is not differentiable. ½ ´Ø   Ý µ¾ Ý step Æ ÛÜ ¾ ¼ Threshold Unit: y ´ ÈÆ Û Ü 1 ½ if ¼ Ý ¼ if ÈÆ ¼ Û Ü ¼ ¼ 0 Σwi xi i Sigmoid Unit: y ½ 1 Ý  ÈÆ ¡ ½ · ÜÔ   ÛÜ 0 Σwi xi i 5
    6. Gradient Descent E The error is now slope slope a differentiable -ve +ve function. wi Change weights using negative slope ¡Û   Û Û +ve ¡Û -ve move towards minimum of Û -ve ¡Û +ve This approach is called Gradient Descent 6
    7. Derivation of Back Propagation x1 v1 y1 x2 v2 y2 xk vj yi uj k wi j xN vN yN inputs hidden outputs xk vj yi  È ¡ output Ý sig Û Ú  È ¡ hidden Ú sig Ù Ü error ½È È  Ø   Ý ¡¾ ¾ We need to find the derivatives of with respect to weights Û and Ù . 7
    8. Preliminaries xk ujk vj wij yi On a single pattern (drop ) ½   ¡¾ ¾ Ø  Ý and ½ Ý  ÈÆ ¡ ½ · ÜÔ   Û Ú Note that: Ý   ¡ Ú Ý ½ Ý Û Ý   ¡ Û Ý ½ Ý Ú since if Ý ½ ½ · ÜÔ´ Üµ Ý then Ý ´½   Ý µ Ü 8
    9. Between Hidden and Output Û xk ujk vj wij yi For weights between hidden units and output units. ½   ¡¾ ¾ Ø  Ý Ý Û Ý Û   ¡ Ý Ý  Ø Ý Û Ý ´½  ÝµÚ   ¡ Û Ý   Ø ßÞ ´½   Ý µ Ú Ý call this Æ 9
    10. Between Input and Hidden Ù xk ujk vj wij yi For weights between input units and hidden units. ½   ¡¾ ¾ Ø  Ý Ý Ú Ù Ý Ú Ù   ¡ Ý Ý  Ø Ý Ú Ý ´½  ÝµÛ Ú Ù Ú ´½   Ú µ Ü   ¡ Ù Ý   Ø Ý ´½   Ý µ Û Ú ´½   Ú µ Ü Ù ÆÛ Ú ´½   Ú µ Ü 10
    11. Between Hidden and Output ¡Û xk ujk vj wij yi Modifying weights between hidden units and output units using gradient descent. ¡Û   Û   ¡   Ý   ßÞ Ø Ý ´½   ßÞ Ý µ Ú close to ¼ ½ small for Ý Learning constant “input” error ßÞ Æ 11
    12. Between Input and Hidden ¡Ù xk ujk vj wij yi Modifying weights between input units and hidden units using gradient descent. ¡Ù   Ù   Æ Û Ú ´½   Ú µÜ back propagation of error The same procedure is applicable to a net with many hidden layers. 12
    13. An Example x1 u x2 =0 2.0 21 .8 = u 11 =2.0 u 12 u 22 =0.8 ܽ ܾ target Ø u 10 = -1.0 u 20 = -1.0 0 0 0 v1 v2 1 1 0 1 1 1 0 1 w1 =2.0 w2 = -1.0 1 1 0 1 y w0 = -1.0   ¡ hidden Ú½ sig Ù½½Ü½ · Ù½¾Ü¾ · Ù½¼   0.9526 ¡ Ú¾ sig Ù¾½Ü½ · Ù¾¾Ü¾ · Ù¾¼   0.6457 ¡ output Ý sig Û½Ú½ · Û¾Ú¾ · Û¼ 0.5645 error ½  Ø   Ý ¡¾ ¾ 0.1593 13
    14. An Example: updating the weights Learning constant ½¼ output Æ ´Ý   ص Ý´½   ݵ 0.1388 ¡Û¼   ƽ ¼ -0.1388 ¡Û½   ÆÚ½ -0.1322 ¡Û¾   ÆÚ¾ -0.0896 hidden (to Ú½) hidden (to Ú¾) ¡Ù½¼   ÆÛ½ Ú½´½   Ú½µ½ ¼ ¡Ù¾¼   ÆÛ¾ Ú¾´½   Ú¾µ½ ¼ -0.0125 0.0318 ¡Ù½½   ÆÛ½ Ú½´½   Ú½µÜ½ ¡Ù¾½   ÆÛ¾ Ú¾´½   Ú¾µÜ½ -0.0125 0.0318 ¡Ù½¾   ÆÛ½ Ú½´½   Ú½µÜ¾ ¡Ù¾¾   ÆÛ¾ Ú¾´½   Ú¾µÜ¾ -0.0125 0.0318 14
    15. An Example: a New Error x1 u x2 8 =0 1.9 21 .83 = u 11 =1.98 u 12 u 22 =0.83 ܽ ܾ target Ø u 10 = -1.01 u 20 = -0.96 0 0 0 v1 v2 1 1 0 1 1 1 0 1 w1 =1.86 w2 = -1.08 1 1 0 1 y w0 = -1.13   ¡ hidden Ú½ sig Ù½½Ü½ · Ù½¾Ü¾ · Ù½¼   0.9509 ¡ Ú¾ sig Ù¾½Ü½ · Ù¾¾Ü¾ · Ù¾¼   0.6672 ¡ output Ý sig Û½Ú½ · Û¾Ú¾ · Û¼ 0.4776 error ½  Ø   Ý ¡¾ ¾ 0.1140 The error has reduced for this pattern. 15
    16. Summary Credit-assignment problem solved for hidden units: Input Output ƽ Û½ Û¾ Æ Æ¾ Û¿ Æ ¼ ´ µÈ Û Æ Æ¿ Errors total input to unit ; 1st derivative of acti- ¼ vation function (sigmoid) Outstanding issues: 1. Number of layers; number and type of units in layer 2. Learning rates 3. Local or distributed representations 16
    SlideShare Zeitgeist 2009

    + ESCOMESCOM Nominate

    custom

    118 views, 0 favs, 0 embeds more stats

    More info about this document

    © All Rights Reserved

    Go to text version

    • Total Views 118
      • 118 on SlideShare
      • 0 from embeds
    • Comments 0
    • Favorites 0
    • Downloads 2
    Most viewed embeds

    more

    All embeds

    less

    Flagged as inappropriate Flag as inappropriate
    Flag as inappropriate

    Select your reason for flagging this presentation as inappropriate. If needed, use the feedback form to let us know more details.

    Cancel
    File a copyright complaint
    Having problems? Go to our helpdesk?

    Categories