- 3. Activation functions • Unioplar • Bipolar
- 5. Example Suppose a feedforward neural network with n inputs, m hidden units (tanh activation), and l output units (linear activation). vji is the weight from input i to hidden unit j. wkj is the weight from hidden unit j to output unit k. If the error is we can find partial derivatives (backpropagation) and apply gradient descent.
- 6. Hebbian Learning Rule • The learning signal is equal simply to the neuron’s output (Hebb 1949). We have : ( ) w of the weight vector becomes w ( ) t i i t i i r f w x The increment cf w x x = ∆ ∆ = single weight w is adapted using the following w ( ) This can be briefly written as w , fo rj=1, 2, 3.......n ij t ij i j ij i j The cf w x x co x ∆ = ∆ =
- 7. Hebbian Learning Rule • This rule represents a purely feed forward, unsupervised learning. • This rule states that if the crossproduct of the output and the input or correlation term is positive, this results in an increase of weight, otherwise the weight decreases.
- 8. Perceptron Learning Rule • The learning signal is the difference between the desired and the actual neuron’s response (Rosenblatt 1958). Thus, learning is supervised and the learning signal is equal to : • And di is the desired response. • Weight adjustments are obtained as follows : where sgn( x)t i i i ir d o o w= − = [ sgn( x)]xt i i iw c d w∆ = − [ sgn( x)] for j=1,2....nt ij i i jw c d w x∆ = −
- 9. Perceptron Learning Rule • This rule is applicable only for binary neuron response, and the above relationships express the rule for the bipolar binary case. • Here, the weights are adjusted if and only if is incorrect. • Since the desired response is either +1 or -1, the weight adjustment reduces to • Where a + sign is applicable when di = 1 and • The weight adjustment is zero when the desired and the actual responses agree. io 2 xiw c∆ = ± sgn( x) 1t iw = −
- 10. Delta learning Rule • The delta learning rule only valid for continuous activation function and in the supervised training mode • The learning signal for this mode is called delta and is defined as follows • the term f1 (wt ix) is the derivative of the activation function f(net) computed for net = wt ix [ ( x)] ( x)t t i i ir d f w f w′= −
- 12. Delta learning Rule • Learning rule can derived from the condition of least squared error between oi and di • Calculating the gradient vector with respect to wi of the squared error defined as • which is equivalent to 21 ( ) 2 i iE d o= − 21 [ ( x)] 2 t i iE d f w= −
- 13. Delta learning Rule • We obtain the error gradient vector value ∀∇E= -(di-oi) f1 (wt ix)x • The components of the gradient vector are • since the minimization of the error requires the weight changes to be in the negative gradient direction,we take ∀∆wi= -η∇E where η is a positive constant
- 14. Delta learning Rule • We then obtain ∀∆wi = η(di-oi) f1 (neti)x • or, for the single weight the adjustment becomes ∀∆wij = η(di-oi) f1 (neti)xj, for j=1,2,…,n • note that weight adjustments computed based on minimization of the squared error
- 15. Delta learning Rule • Considering the use of the general learning rule and plugging in the learning signal the weighting adjustment becomes ∀∆wi = c(di-oi) f1 (neti)x
- 16. Widrow-Hoff learning Rule • The Windrow-Hoff learning rule is applicable for the supervised training of neural networks • It is independent of the activation function of neurons used since it maximizes the squared error between the desired output value di and the neuron’s activation value neti = wi t x
- 17. Widrow-Hoff learning Rule • The learning signal for this rule is defined as follows r = di - wi t x • the weight vector increment under this learning rule is or, for the single weight, the adjustment is j = 1, 2 ….n • this rule can be considered a special case of the delta learning rule . t i i iw =c (d - w x) xV t ij i i jw =c (d - w x) xV
- 18. Widrow-Hoff learning Rule • assuming that f(wi t x)= wi t x, or the activation function is simply the identity function f(net)=net, f ’ (net)=1. • This rule is sometimes called the LMS (Least mean square)learning rule. • weights are initialized at any values in this method.
- 19. Correlation Learning Rule • By substituting r = di into the general learning rule we obtain the correlation learning rule. • The adjustments for the weight vector and the single weights respectively, are ∆wi=cdix ∆ wij =cdixj for j=1,2,….n
- 20. Winner_take_All Learning Rule • Winner_take_All Learning Rule is used for learning statistical properties of input. • The learning is based on the premise that one of the neurons in the layer, say the m’th , has the max. response due to input x,as shown in. • This neuron is declared the winner.As a result of this winning event, the weight vector wm
- 22. Winner_take_All Learning Rule • Wm=[wm1 wm2 …. Wmn]t • containing weights highlighted in the figure is the only one adjusted in the given unsupervised learning step • Its increment is computed as follows ∆wm=α(x-wm) • or,the individual weight adjustment becomes ∆wmj= α(xj-wmj) for j=1,2, …n
- 23. Winner_take_All Learning Rule • Where ∝>0 is a small learning constant,typically decreasing as learning progresses • the winner selection is based on the following criterion of max activation among all p neurons participating in a competition: wm t x = max(wi t x) i=1,2, … n
- 24. Outstar Learning Rule • The weight adjustments in this rule are computed as follows ∆wj =β (d-wj) • or, the individual adjustments are ∆wmj =β (dm-wmj) for m=1,2,..p • note that in contrast to any learning rule discussed so far, the adjusted weights are fanning out of the j’th node in this learning
- 25. Outstar Learning Rule method and the weight vector is defined accordingly as wj=[w1j w2j … wpj]t