2013-1 Machine Learning Lecture 04 - Michael Negnevitsky - Artificial neur…

711 views
597 views

Published on

Published in: Education, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
711
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
44
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

2013-1 Machine Learning Lecture 04 - Michael Negnevitsky - Artificial neur…

  1. 1. Slides are based on Negnevitsky, Pearson Education, 2005 1Lecture 8Artificial neural networks:Unsupervised learning Introduction Hebbian learning Generalised Hebbian learning algorithm Competitive learning Self-organising computational map:Kohonen network Summary
  2. 2. Slides are based on Negnevitsky, Pearson Education, 2005 2The main property of a neural network is anability to learn from its environment, and toimprove its performance through learning. Sofar we have considered supervised or activelearning  learning with an external “teacher”or a supervisor who presents a training set to thenetwork. But another type of learning alsoexists: unsupervised learning.Introduction
  3. 3. Slides are based on Negnevitsky, Pearson Education, 2005 3 In contrast to supervised learning, unsupervised orself-organised learning does not require anexternal teacher. During the training session, theneural network receives a number of differentinput patterns, discovers significant features inthese patterns and learns how to classify input datainto appropriate categories. Unsupervisedlearning tends to follow the neuro-biologicalorganisation of the brain. Unsupervised learning algorithms aim to learnrapidly and can be used in real-time.
  4. 4. Slides are based on Negnevitsky, Pearson Education, 2005 4In 1949, Donald Hebb proposed one of the keyideas in biological learning, commonly known asHebb’s Law. Hebb’s Law states that if neuron i isnear enough to excite neuron j and repeatedlyparticipates in its activation, the synaptic connectionbetween these two neurons is strengthened andneuron j becomes more sensitive to stimuli fromneuron i.Hebbian learning
  5. 5. Slides are based on Negnevitsky, Pearson Education, 2005 5Hebb’s Law can be represented in the form of tworules:1. If two neurons on either side of a connectionare activated synchronously, then the weight ofthat connection is increased.2. If two neurons on either side of a connectionare activated asynchronously, then the weightof that connection is decreased.Hebb’s Law provides the basis for learningwithout a teacher. Learning here is a localphenomenon occurring without feedback fromthe environment.
  6. 6. Slides are based on Negnevitsky, Pearson Education, 2005 6Hebbian learning in a neural networki jInputSignalsOutputSignals
  7. 7. Slides are based on Negnevitsky, Pearson Education, 2005 7 Using Hebb’s Law we can express the adjustmentapplied to the weight wij at iteration p in thefollowing form: As a special case, we can represent Hebb’s Law asfollows:where  is the learning rate parameter.This equation is referred to as the activity productrule.][ )(),()( pxpyFpw ijij )()()( pxpypw ijij 
  8. 8. Slides are based on Negnevitsky, Pearson Education, 2005 8 Hebbian learning implies that weights can onlyincrease. To resolve this problem, we mightimpose a limit on the growth of synaptic weights.It can be done by introducing a non-linearforgetting factor into Hebb’s Law:where  is the forgetting factor.Forgetting factor usually falls in the intervalbetween 0 and 1, typically between 0.01 and 0.1,to allow only a little “forgetting” while limitingthe weight growth.)()()()()( pwpypxpypw ijjijij 
  9. 9. Slides are based on Negnevitsky, Pearson Education, 2005 9Hebbian learning algorithmStep 1: Initialisation.Set initial synaptic weights and thresholds to smallrandom values, say in an interval [0, 1].Step 2: Activation.Compute the neuron output at iteration pwhere n is the number of neuron inputs, and j is thethreshold value of neuron j.jniijij pwpxpy  1)()()(
  10. 10. Slides are based on Negnevitsky, Pearson Education, 2005 10Step 3: Learning.Update the weights in the network:where wij(p) is the weight correction at iteration p.The weight correction is determined by thegeneralised activity product rule:Step 4: Iteration.Increase iteration p by one, go back to Step 2.)()()1( pwpwpw ijijij ][ )()()()( pwpxpypw ijijij 
  11. 11. Slides are based on Negnevitsky, Pearson Education, 2005 11To illustrate Hebbian learning, consider a fullyconnected feedforward network with a single layerof five computation neurons. Each neuron isrepresented by a McCulloch and Pitts model withthe sign activation function. The network is trainedon the following set of input vectors:Hebbian learning example000001X100102X010003X001004X100105X
  12. 12. Slides are based on Negnevitsky, Pearson Education, 2005 12Input layerx1 1Output layer21 y1y2x2 2x3 3x4 4x5 543 y3y45 y51000110001Input layerx1 1Output layer21 y1y2x2 2x3 3x4 4x543 y3y45 y5100010010125Initial and final states of the network
  13. 13. Slides are based on Negnevitsky, Pearson Education, 2005 13O u t p u t l a y e rInputlayer100000100000100000100000121 43 512345O u t p u t l a y e rInputlayer 000001 2 3 4 512345002.020402.02041.02000000 0.99960000002.020402.0204(b).Initial and final weight matrices
  14. 14. Slides are based on Negnevitsky, Pearson Education, 2005 14 When this probe is presented to the network, weobtain:10001X100100737.09478.00907.02661.04940.0100012.0204002.0204000.9996000001.0200002.0204002.0204000000signY A test input vector, or probe, is defined as
  15. 15. Slides are based on Negnevitsky, Pearson Education, 2005 15 In competitive learning, neurons compete amongthemselves to be activated. While in Hebbian learning, several output neuronscan be activated simultaneously, in competitivelearning, only a single output neuron is active atany time. The output neuron that wins the “competition” iscalled the winner-takes-all neuron.Competitive learning
  16. 16. Slides are based on Negnevitsky, Pearson Education, 2005 16 The basic idea of competitive learning wasintroduced in the early 1970s. In the late 1980s, Teuvo Kohonen introduced aspecial class of artificial neural networks calledself-organising feature maps. These maps arebased on competitive learning.
  17. 17. Slides are based on Negnevitsky, Pearson Education, 2005 17Our brain is dominated by the cerebral cortex, avery complex structure of billions of neurons andhundreds of billions of synapses. The cortexincludes areas that are responsible for differenthuman activities (motor, visual, auditory,somatosensory, etc.), and associated with differentsensory inputs. We can say that each sensoryinput is mapped into a corresponding area of thecerebral cortex. The cortex is a self-organisingcomputational map in the human brain.What is a self-organising feature map?
  18. 18. Slides are based on Negnevitsky, Pearson Education, 2005 18Feature-mapping Kohonen modelInput layerKohonen layer(a)Input layerKohonen layer1 0(b)0 1
  19. 19. Slides are based on Negnevitsky, Pearson Education, 2005 19 The Kohonen model provides a topologicalmapping. It places a fixed number of inputpatterns from the input layer into a higher-dimensional output or Kohonen layer. Training in the Kohonen network begins with thewinner’s neighbourhood of a fairly large size.Then, as training proceeds, the neighbourhood sizegradually decreases.The Kohonen network
  20. 20. Slides are based on Negnevitsky, Pearson Education, 2005 20Architecture of the Kohonen NetworkInputlayerOutputSignalsInputSignalsx1x2Outputlayery1y2y3
  21. 21. Slides are based on Negnevitsky, Pearson Education, 2005 21 The lateral connections are used to create acompetition between neurons. The neuron with thelargest activation level among all neurons in theoutput layer becomes the winner. This neuron isthe only neuron that produces an output signal.The activity of all other neurons is suppressed inthe competition. The lateral feedback connections produceexcitatory or inhibitory effects, depending on thedistance from the winning neuron. This isachieved by the use of a Mexican hat functionwhich describes synaptic weights between neuronsin the Kohonen layer.
  22. 22. Slides are based on Negnevitsky, Pearson Education, 2005 22The Mexican hat function of lateral connectionConnectionstrengthDistanceExcitatoryeffectInhibitoryeffectInhibitoryeffect01
  23. 23. Slides are based on Negnevitsky, Pearson Education, 2005 23 In the Kohonen network, a neuron learns byshifting its weights from inactive connections toactive ones. Only the winning neuron and itsneighbourhood are allowed to learn. If a neurondoes not respond to a given input pattern, thenlearning cannot occur in that particular neuron. The competitive learning rule defines the changewij applied to synaptic weight wij aswhere xi is the input signal and  is the learningrate parameter. ncompetitiothelosesneuronif,0ncompetitiothewinsneuronif),(jjwxwijiij
  24. 24. Slides are based on Negnevitsky, Pearson Education, 2005 24 The overall effect of the competitive learning ruleresides in moving the synaptic weight vector Wj ofthe winning neuron j towards the input pattern X.The matching criterion is equivalent to theminimum Euclidean distance between vectors. The Euclidean distance between a pair of n-by-1vectors X and Wj is defined bywhere xi and wij are the ith elements of the vectorsX and Wj, respectively.2/112)( niijij wxd WX
  25. 25. Slides are based on Negnevitsky, Pearson Education, 2005 25 To identify the winning neuron, jX, that bestmatches the input vector X, we may apply thefollowing condition:where m is the number of neurons in the Kohonenlayer.,jjminj WXX 
  26. 26. Slides are based on Negnevitsky, Pearson Education, 2005 26 Suppose, for instance, that the 2-dimensional inputvector X is presented to the three-neuron Kohonennetwork, The initial weight vectors, Wj, are given by12.052.0X81.027.01W 70.042.02W 21.043.03W
  27. 27. Slides are based on Negnevitsky, Pearson Education, 2005 27 We find the winning (best-matching) neuron jXusing the minimum-distance Euclidean criterion:221221111 )()( wxwxd  73.0)81.012.0()27.052.0( 22222221212 )()( wxwxd  59.0)70.012.0()42.052.0( 22223221313 )()( wxwxd  13.0)21.012.0()43.052.0( 22 Neuron 3 is the winner and its weight vector W3 isupdated according to the competitive learning rule.0.01)43.052.0(1.0)( 13113  wxw 0.01)21.012.0(1.0)( 23223  wxw 
  28. 28. Slides are based on Negnevitsky, Pearson Education, 2005 28 The updated weight vector W3 at iteration (p + 1)is determined as: The weight vector W3 of the wining neuron 3becomes closer to the input vector X with eachiteration.20.044.001.00.0121.043.0)()()1( 333 ppp WWW
  29. 29. Slides are based on Negnevitsky, Pearson Education, 2005 29Step 1: Initialisation.Set initial synaptic weights to small randomvalues, say in an interval [0, 1], and assign a smallpositive value to the learning rate parameter .Competitive Learning Algorithm
  30. 30. Slides are based on Negnevitsky, Pearson Education, 2005 30Step 2: Activation and Similarity Matching.Activate the Kohonen network by applying theinput vector X, and find the winner-takes-all (bestmatching) neuron jX at iteration p, using theminimum-distance Euclidean criterionwhere n is the number of neurons in the inputlayer, and m is the number of neurons in theKohonen layer.,)()()(2/112][ niijijjpwxpminpj WXX
  31. 31. Slides are based on Negnevitsky, Pearson Education, 2005 31Step 3: Learning.Update the synaptic weightswhere wij(p) is the weight correction at iteration p.The weight correction is determined by thecompetitive learning rule:where  is the learning rate parameter, and j(p) isthe neighbourhood function centred around thewinner-takes-all neuron jX at iteration p.)()()1( pwpwpw ijijij )(,0)(,)()(][pjpjpwxpwjjijiij
  32. 32. Slides are based on Negnevitsky, Pearson Education, 2005 32Step 4: Iteration.Increase iteration p by one, go back to Step 2 andcontinue until the minimum-distance Euclideancriterion is satisfied, or no noticeable changesoccur in the feature map.
  33. 33. Slides are based on Negnevitsky, Pearson Education, 2005 33 To illustrate competitive learning, consider theKohonen network with 100 neurons arranged in theform of a two-dimensional lattice with 10 rows and10 columns. The network is required to classifytwo-dimensional input vectors  each neuron in thenetwork should respond only to the input vectorsoccurring in its region. The network is trained with 1000 two-dimensionalinput vectors generated randomly in a squareregion in the interval between –1 and +1. Thelearning rate parameter  is equal to 0.1.Competitive learning in the Kohonen network
  34. 34. Slides are based on Negnevitsky, Pearson Education, 2005 34-0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1W(1,j)-1-1-0.8-0.6-0.4-0.200.20.40.60.81W(2,j)Initial random weights
  35. 35. Slides are based on Negnevitsky, Pearson Education, 2005 35Network after 100 iterations-0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1W(1,j)-1-1-0.8-0.6-0.4-0.200.20.40.60.81W(2,j)
  36. 36. Slides are based on Negnevitsky, Pearson Education, 2005 36Network after 1000 iterations-0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1W(1,j)-1-1-0.8-0.6-0.4-0.200.20.40.60.81W(2,j)
  37. 37. Slides are based on Negnevitsky, Pearson Education, 2005 37Network after 10,000 iterations-0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1W(1,j)-1-1-0.8-0.6-0.4-0.200.20.40.60.81W(2,j)
  38. 38. Slides are based on Negnevitsky, Pearson Education, 2005 38 One form of unsupervised learning is clustering. Among neural network models, the Self-Organizing Map (SOM) and Adaptive resonancetheory (ART) are commonly used unsupervisedlearning algorithms.– The SOM is a topographic organization in whichnearby locations in the map represent inputs withsimilar properties.– ART networks are used for many pattern recognitiontasks, such as automatic target recognition and seismicsignal processing. The first version of ART was"ART1", developed by Carpenter and Grossberg(1988).

×