2013-1 Machine Learning Lecture 04 - Michael Negnevitsky - Artificial neur…


Published on

Published in: Education, Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

2013-1 Machine Learning Lecture 04 - Michael Negnevitsky - Artificial neur…

  1. 1. © Negnevitsky, Pearson Education, 2005 1Lecture 7Artificial neural networks:Supervised learning Introduction, or how the brain works The neuron as a simple computing element The perceptron Multilayer neural networks Accelerated learning in multilayer neural networks The Hopfield network Bidirectional associative memories (BAM) Summary
  2. 2. © Negnevitsky, Pearson Education, 2005 2Introduction, or how the brain worksMachine learning involves adaptive mechanismsthat enable computers to learn from experience,learn by example and learn by analogy. Learningcapabilities can improve the performance of anintelligent system over time. The most popularapproaches to machine learning are artificialneural networks and genetic algorithms. Thislecture is dedicated to neural networks.
  3. 3. © Negnevitsky, Pearson Education, 2005 3 A neural network can be defined as a model ofreasoning based on the human brain. The brainconsists of a densely interconnected set of nervecells, or basic information-processing units, calledneurons. The human brain incorporates nearly 10 billionneurons and 60 trillion connections, synapses,between them. By using multiple neuronssimultaneously, the brain can perform its functionsmuch faster than the fastest computers in existencetoday.
  4. 4. © Negnevitsky, Pearson Education, 2005 4 Each neuron has a very simple structure, but anarmy of such elements constitutes a tremendousprocessing power. A neuron consists of a cell body, soma, a numberof fibers called dendrites, and a single long fibercalled the axon.
  5. 5. © Negnevitsky, Pearson Education, 2005 5Biological neural networkSoma SomaSynapseSynapseDendritesAxonSynapseDendritesAxon
  6. 6. © Negnevitsky, Pearson Education, 2005 6 Our brain can be considered as a highly complex,non-linear and parallel information-processingsystem. Information is stored and processed in a neuralnetwork simultaneously throughout the wholenetwork, rather than at specific locations. In otherwords, in neural networks, both data and itsprocessing are global rather than local. Learning is a fundamental and essentialcharacteristic of biological neural networks. Theease with which they can learn led to attempts toemulate a biological neural network in a computer.
  7. 7. © Negnevitsky, Pearson Education, 2005 7 An artificial neural network consists of a number ofvery simple processors, also called neurons, whichare analogous to the biological neurons in the brain. The neurons are connected by weighted linkspassing signals from one neuron to another. The output signal is transmitted through theneuron’s outgoing connection. The outgoingconnection splits into a number of branches thattransmit the same signal. The outgoing branchesterminate at the incoming connections of otherneurons in the network.
  8. 8. © Negnevitsky, Pearson Education, 2005 8Architecture of a typical artificial neural networkInput Layer Output LayerMiddleLayerInputSignalsOutputSignals
  9. 9. © Negnevitsky, Pearson Education, 2005 9Analogy between biological andartificial neural networksBiological Neural Network Artificial Neural NetworkSomaDendriteAxonSynapseNeuronInputOutputWeight
  10. 10. © Negnevitsky, Pearson Education, 2005 10The neuron as a simple computing elementDiagram of a neuronNeuron YInput Signalsx1x2xnOutput SignalsYYYw2w1wnWeights
  11. 11. © Negnevitsky, Pearson Education, 2005 11 The neuron computes the weighted sum of the inputsignals and compares the result with a thresholdvalue, q. If the net input is less than the threshold,the neuron output is –1. But if the net input is greaterthan or equal to the threshold, the neuron becomesactivated and its output attains a value +1. The neuron uses the following transfer or activationfunction: This type of activation function is called a signfunction.niii wxX1 q1q1XXYif,if,
  12. 12. © Negnevitsky, Pearson Education, 2005 12Activation functions of a neuronStep function Sign function+1-10+1-10XYXY1 1-10 XYSigmoidfunction-10 XYLinear function0if,00if,1XXYstep0if,10if,1XXYsignXsigmoideY11XYlinear
  13. 13. © Negnevitsky, Pearson Education, 2005 13Can a single neuron learn a task? In 1958, Frank Rosenblatt introduced a trainingalgorithm that provided the first procedure fortraining a simple ANN: a perceptron. The perceptron is the simplest form of a neuralnetwork. It consists of a single neuron withadjustable synaptic weights and a hard limiter.
  14. 14. © Negnevitsky, Pearson Education, 2005 14Single-layer two-input perceptronThresholdInputsx1x2OutputYHardLimiterw2w1LinearCombinerq
  15. 15. © Negnevitsky, Pearson Education, 2005 15The Perceptron The operation of Rosenblatt’s perceptron is basedon the McCulloch and Pitts neuron model. Themodel consists of a linear combiner followed by ahard limiter. The weighted sum of the inputs is applied to thehard limiter, which produces an output equal to +1if its input is positive and 1 if it is negative.
  16. 16. © Negnevitsky, Pearson Education, 2005 16 The aim of the perceptron is to classify inputs, x1,x2, . . ., xn, into one of two classes, say A1and A2. In the case of an elementary perceptron, the n-dimensional space is divided by a hyperplane intotwo decision regions. The hyperplane is defined bythe linearly separable function:01qniii wx
  17. 17. © Negnevitsky, Pearson Education, 2005 17Linear separability in the perceptronsx1x2Class A2Class A112x1w1 + x2w2 q = 0(a) Two-input perceptron. (b) Three-input perceptron.x2x1x3x1w1 + x2w2 + x3w3 q = 012
  18. 18. © Negnevitsky, Pearson Education, 2005 18How does the perceptron learn its classificationtasks?This is done by making small adjustments in theweights to reduce the difference between the actualand desired outputs of the perceptron. The initialweights are randomly assigned, usually in the range[0.5, 0.5], and then updated to obtain the outputconsistent with the training examples.
  19. 19. © Negnevitsky, Pearson Education, 2005 19 If at iteration p, the actual output is Y(p) and thedesired output is Yd (p), then the error is given by:where p = 1, 2, 3, . . .Iteration p here refers to the pth training examplepresented to the perceptron. If the error, e(p), is positive, we need to increaseperceptron output Y(p), but if it is negative, weneed to decrease Y(p).)()()( pYpYpe d 
  20. 20. © Negnevitsky, Pearson Education, 2005 20)()()()1( pepxpwpw iii  a. .where p = 1, 2, 3, . . .a is the learning rate, a positive constant less thanunity.The perceptron learning rule was first proposed byRosenblatt in 1960. Using this rule we can derivethe perceptron training algorithm for classificationtasks.The perceptron learning rule
  21. 21. © Negnevitsky, Pearson Education, 2005 21Perceptron’s training algorithmStep 1: InitialisationSet initial weights w1, w2,…, wn and threshold qto random numbers in the range [0.5, 0.5].If the error, e(p), is positive, we need to increaseperceptron output Y(p), but if it is negative, weneed to decrease Y(p).
  22. 22. © Negnevitsky, Pearson Education, 2005 22Perceptron’s training algorithm (continued)Step 2: ActivationActivate the perceptron by applying inputs x1(p),x2(p),…, xn(p) and desired output Yd (p).Calculate the actual output at iteration p = 1where n is the number of the perceptron inputs,and step is a step activation function.q niii pwpxsteppY1)()()(
  23. 23. © Negnevitsky, Pearson Education, 2005 23Perceptron’s training algorithm (continued)Step 3: Weight trainingUpdate the weights of the perceptronwhere Dwi(p) is the weight correction at iteration p.The weight correction is computed by the delta rule:Step 4: IterationIncrease iteration p by one, go back to Step 2 andrepeat the process until convergence.)()()1( pwpwpw iii D)()()( pepxpw ii aD .
  24. 24. © Negnevitsky, Pearson Education, 2005 24Example of perceptron learning: the logical operation ANDInputsx1 x200110101000EpochDesiredoutputYd1Initialweightsw1 w210.0.10.10.10.10010ActualoutputYErrore0011Finalweightsw1 w20.0.10.10.10.000110101000210. q = 0.2; learningrate: = 0.1
  25. 25. © Negnevitsky, Pearson Education, 2005 25Two-dimensional plots of basic logical operationsA perceptron can learn the operations AND and OR,but not Exclusive-OR.x1x211x1x211(b) OR (x1  x2)x1x211(c) Exclusive-OR(x1  x2)00 0(a) AND (x1  x2)
  26. 26. © Negnevitsky, Pearson Education, 2005 26Multilayer neural networks A multilayer perceptron is a feedforward neuralnetwork with one or more hidden layers. The network consists of an input layer of sourceneurons, at least one middle or hidden layer ofcomputational neurons, and an output layer ofcomputational neurons. The input signals are propagated in a forwarddirection on a layer-by-layer basis.
  27. 27. © Negnevitsky, Pearson Education, 2005 27Multilayer perceptron with two hidden layersInputlayerFirsthiddenlayerSecondhiddenlayerOutputlayerInputSignalsOutputSignals
  28. 28. © Negnevitsky, Pearson Education, 2005 28What does the middle layer hide? A hidden layer “hides” its desired output.Neurons in the hidden layer cannot be observedthrough the input/output behaviour of the network.There is no obvious way to know what the desiredoutput of the hidden layer should be. Commercial ANNs incorporate three andsometimes four layers, including one or twohidden layers. Each layer can contain from 10 to1000 neurons. Experimental neural networks mayhave five or even six layers, including three orfour hidden layers, and utilise millions of neurons.
  29. 29. © Negnevitsky, Pearson Education, 2005 29Back-propagation neural network Learning in a multilayer network proceeds thesame way as for a perceptron. A training set of input patterns is presented to thenetwork. The network computes its output pattern, and ifthere is an error  or in other words a differencebetween actual and desired output patterns  theweights are adjusted to reduce this error.
  30. 30. © Negnevitsky, Pearson Education, 2005 30 In a back-propagation neural network, the learningalgorithm has two phases. First, a training input pattern is presented to thenetwork input layer. The network propagates theinput pattern from layer to layer until the outputpattern is generated by the output layer. If this pattern is different from the desired output,an error is calculated and then propagatedbackwards through the network from the outputlayer to the input layer. The weights are modifiedas the error is propagated.
  31. 31. © Negnevitsky, Pearson Education, 2005 31Three-layer back-propagation neural networkInputlayerxix1x2xn12inOutputlayer12klyky1y2ylInput signalsError signalswjkHiddenlayerwij12jm
  32. 32. © Negnevitsky, Pearson Education, 2005 32The back-propagation training algorithmStep 1: InitialisationSet all the weights and threshold levels of thenetwork to random numbers uniformlydistributed inside a small range:where Fi is the total number of inputs of neuron iin the network. The weight initialisation is doneon a neuron-by-neuron basis.ii FF4.2,4.2
  33. 33. © Negnevitsky, Pearson Education, 2005 33Step 2: ActivationActivate the back-propagation neural network byapplying inputs x1(p), x2(p),…, xn(p) and desiredoutputs yd,1(p), yd,2(p),…, yd,n(p).(a) Calculate the actual outputs of the neurons inthe hidden layer:where n is the number of inputs of neuron j in thehidden layer, and sigmoid is the sigmoid activationfunction.q jniijij pwpxsigmoidpy1)()()(
  34. 34. © Negnevitsky, Pearson Education, 2005 34(b) Calculate the actual outputs of the neurons inthe output layer:Step 2 : Activation (continued)where m is the number of inputs of neuron k in theoutput layer.q kmjjkjkk pwpxsigmoidpy1)()()(
  35. 35. © Negnevitsky, Pearson Education, 2005 35Step 3: Weight trainingUpdate the weights in the back-propagation networkpropagating backward the errors associated withoutput neurons. (a)Calculate the error gradient for the neurons in theoutput layer:whereCalculate the weight corrections:Update the weights at the output neurons:)()()1( pwpwpw jkjkjk D)()(1)()( pepypyp kkkk )()()( , pypype kkdk )()()( ppypw kjjk D
  36. 36. © Negnevitsky, Pearson Education, 2005 36(b) Calculate the error gradient for the neurons inthe hidden layer:Step 3: Weight training (continued)Calculate the weight corrections:Update the weights at the hidden neurons:)()()(1)()(1][ pwppypyp jklkkjjj )()()( ppxpw jiij D)()()1( pwpwpw ijijij D
  37. 37. © Negnevitsky, Pearson Education, 2005 37Step 4: IterationIncrease iteration p by one, go back to Step 2 andrepeat the process until the selected error criterionis satisfied.As an example, we may consider the three-layerback-propagation network. Suppose that thenetwork is required to perform logical operationExclusive-OR. Recall that a single-layer perceptroncould not do this operation. Now we will apply thethree-layer net.
  38. 38. © Negnevitsky, Pearson Education, 2005 38Three-layer network for solving theExclusive-OR operationy55x1 31x2InputlayerOutputlayerHiddenlayer42q3w13w24w23w24w35w45q4q5111
  39. 39. © Negnevitsky, Pearson Education, 2005 39 The effect of the threshold applied to a neuron in thehidden or output layer is represented by its weight, q,connected to a fixed input equal to 1. The initial weights and threshold levels are setrandomly as follows:w13 = 0.5, w14 = 0.9, w23 = 0.4, w24 = 1.0, w35 = 1.2,w45 = 1.1, q3 = 0.8, q4 = 0.1 and q5 = 0.3.
  40. 40. © Negnevitsky, Pearson Education, 2005 40 We consider a training set where inputs x1 and x2 areequal to 1 and desired output yd,5 is 0. The actualoutputs of neurons 3 and 4 in the hidden layer arecalculated as Now the actual output of neuron 5 in the output layeris determined as: Thus, the following error is obtained:5250.01/1)( ) q ewxwxsigmoidy8808.01/1)( ) q ewxwxsigmoidy5097.01/1)( )3.011.18808.02.15250.0(54543535 q ewywysigmoidy5097.05097.0055,  yye d
  41. 41. © Negnevitsky, Pearson Education, 2005 41 The next step is weight training. To update theweights and threshold levels in our network, wepropagate the error, e, from the output layerbackward to the input layer. First, we calculate the error gradient for neuron 5 inthe output layer: Then we determine the weight corrections assumingthat the learning rate parameter, a, is equal to 0.1:1274.05097).0(0.5097)(10.5097)1( 555  eyy0112.0)1274.0(8808.01.05445 D yw0067.0)1274.0(5250.01.05335 D yw0127.0)1274.0()1(1.0)1( 55 qD
  42. 42. © Negnevitsky, Pearson Education, 2005 42 Next we calculate the error gradients for neurons 3and 4 in the hidden layer: We then determine the weight corrections:0381.0)2.1(0.1274)(0.5250)(10.5250)1( 355333  wyy0.0147.114)0.127(0.8808)(10.8808)1( 455444  wyy0038.00381.011.03113 D xw0038.00381.011.03223 D xw0038.00381.0)1(1.0)1( 33 qD0015.0)0147.0(11.04114 D xw0015.0)0147.0(11.04224 D xw0015.0)0147.0()1(1.0)1( 44 qD
  43. 43. © Negnevitsky, Pearson Education, 2005 43 At last, we update all weights and threshold: The training process is repeated until the sum ofsquared errors is less than 0.001.5038.00038.05.0131313 D www8985.00015.09.0141414 D www4038.00038.04.0232323 D www9985.00015.00.1242424 D www2067.10067.02.1353535 D www0888.10112.01.1454545 D www7962.00038.08.0333qDqq0985.00015.01.0444 qDqq3127.00127.03.0555 qDqq
  44. 44. © Negnevitsky, Pearson Education, 2005 44Learning curve for operation Exclusive-OR0 50 100 150 200101EpochSum-Squared Network Error for 224 Epochs10010-110-210-310-4Sum-SquaredError
  45. 45. © Negnevitsky, Pearson Education, 2005 45Final results of three-layer network learningInputsx1 x210101100011Desiredoutputyd00.0155Actualoutputy5 eSum ofsquarederrors0.98490.98490.01750.0010
  46. 46. © Negnevitsky, Pearson Education, 2005 46Network represented by McCulloch-Pitts modelfor solving the Exclusive-OR operationy55x1 31x2 42+1.0111+1.0+1.0+1.0+1.5+1.0+0.5+0.52.0
  47. 47. © Negnevitsky, Pearson Education, 2005 47Decision boundaries(a) Decision boundary constructed by hidden neuron 3;(b) Decision boundary constructed by hidden neuron 4;(c) Decision boundaries constructed by the completethree-layer networkx1x21(a)1x211(b)00x1 + x2 – 1.5 = 0 x1 + x2 – 0.5 = 0x1 x1x211(c)0
  48. 48. © Negnevitsky, Pearson Education, 2005 48Accelerated learning in multilayerneural networks A multilayer network learns much faster when thesigmoidal activation function is represented by ahyperbolic tangent:where a and b are constants.Suitable values for a and b are:a = 1.716 and b = 0.667aeaY bXhtan 12
  49. 49. © Negnevitsky, Pearson Education, 2005 49 We also can accelerate training by including amomentum term in the delta rule:where b is a positive number (0  b  1) called themomentum constant. Typically, the momentumconstant is set to 0.95.This equation is called the generalised delta rule.)()()1()( ppypwpw kjjkjk DD
  50. 50. © Negnevitsky, Pearson Education, 2005 50Learning with momentum for operation Exclusive-OR0 20 40 60 80 100 12010-410-2100102EpochTraining for 126 Epochs0 100 140-1-0.500.511.5Epoch10-310110-120 40 60 80 120LearningRate
  51. 51. © Negnevitsky, Pearson Education, 2005 51Learning with adaptive learning rateTo accelerate the convergence and yet avoid thedanger of instability, we can apply two heuristics:Heuristic 1If the change of the sum of squared errors has the samealgebraic sign for several consequent epochs, then thelearning rate parameter, a, should be increased.Heuristic 2If the algebraic sign of the change of the sum ofsquared errors alternates for several consequent epochs,then the learning rate parameter, a, should bedecreased.
  52. 52. © Negnevitsky, Pearson Education, 2005 52 Adapting the learning rate requires some changesin the back-propagation algorithm. If the sum of squared errors at the current epochexceeds the previous value by more than apredefined ratio (typically 1.04), the learning rateparameter is decreased (typically by multiplyingby 0.7) and new weights and thresholds arecalculated. If the error is less than the previous one, thelearning rate is increased (typically by multiplyingby 1.05).
  53. 53. © Negnevitsky, Pearson Education, 2005 53Learning with adaptive learning rate0 10 20 30 40 50 60 70 80 90 100EpochTr ainingfor 103Epochs0 20 40 60 80 100 12000.
  54. 54. © Negnevitsky, Pearson Education, 2005 54Learning with momentum and adaptive learning rate0 10 20 30 40 50 60 70 80EpochTr ainingfor 85Epochs0 10 20 30 40 50 60 70 80 9000.512.5Epoch10-410-210010210-310110-11.52Sum-SquaredErroLearningRate
  55. 55. © Negnevitsky, Pearson Education, 2005 55The Hopfield Network Neural networks were designed on analogy withthe brain. The brain’s memory, however, works byassociation. For example, we can recognise afamiliar face even in an unfamiliar environmentwithin 100-200 ms. We can also recall a completesensory experience, including sounds and scenes,when we hear only a few bars of music. The brainroutinely associates one thing with another.
  56. 56. © Negnevitsky, Pearson Education, 2005 56 Multilayer neural networks trained with the back-propagation algorithm are used for patternrecognition problems. However, to emulate thehuman memory’s associative characteristics weneed a different type of network: a recurrentneural network. A recurrent neural network has feedback loopsfrom its outputs to its inputs. The presence ofsuch loops has a profound impact on the learningcapability of the network.
  57. 57. © Negnevitsky, Pearson Education, 2005 57 The stability of recurrent networks intriguedseveral researchers in the 1960s and 1970s.However, none was able to predict which networkwould be stable, and some researchers werepessimistic about finding a solution at all. Theproblem was solved only in 1982, when JohnHopfield formulated the physical principle ofstoring information in a dynamically stablenetwork.
  58. 58. © Negnevitsky, Pearson Education, 2005 58Single-layer n-neuron Hopfield networkxix1x2xnyiy1y2yn12in
  59. 59. © Negnevitsky, Pearson Education, 2005 59 The Hopfield network uses McCulloch and Pittsneurons with the sign activation function as itscomputing element:XYXXYsignif,if,10if,1
  60. 60. © Negnevitsky, Pearson Education, 2005 60 The current state of the Hopfield network isdetermined by the current outputs of all neurons, y1,y2, . . ., yn.Thus, for a single-layer n-neuron network, the statecan be defined by the state vector as:nyyy21Y
  61. 61. © Negnevitsky, Pearson Education, 2005 61 In the Hopfield network, synaptic weights betweenneurons are usually represented in matrix form asfollows:where M is the number of states to be memorisedby the network, Ym is the n-dimensional binaryvector, I is n ´ n identity matrix, and superscript Tdenotes matrix transposition.IYYW MMmTmm  1
  62. 62. © Negnevitsky, Pearson Education, 2005 62Possible states for the three-neuronHopfield networky1y2y3(1,1,1)(1,1,1)(1,1,1) (1,1,1)(1, 1,1)(1,1,1)(1, 1,1)(1,1,1)0
  63. 63. © Negnevitsky, Pearson Education, 2005 63 The stable state-vertex is determined by the weightmatrix W, the current input vector X, and thethreshold matrix q. If the input vector is partiallyincorrect or incomplete, the initial state will convergeinto the stable state-vertex after a few iterations. Suppose, for instance, that our network is required tomemorise two opposite states, (1, 1, 1) and (1, 1, 1).Thus,where Y1 and Y2 are the three-dimensional vectors.or1111Y1112Y 1111 TY 1112 TY
  64. 64. © Negnevitsky, Pearson Education, 2005 64 The 3 ´ 3 identity matrix I is Thus, we can now determine the weight matrix asfollows: Next, the network is tested by the sequence of inputvectors, X1 and X2, which are equal to the output (ortarget) vectors Y1 and Y2, respectively.100010001I1000100012111111111111W022202220
  65. 65. © Negnevitsky, Pearson Education, 2005 65 First, we activate the Hopfield network by applyingthe input vector X. Then, we calculate the actualoutput vector Y, and finally, we compare the resultwith the initial input vector X.1110001110222022201 signY1110001110222022202 signY
  66. 66. © Negnevitsky, Pearson Education, 2005 66 The remaining six states are all unstable. However,stable states (also called fundamental memories) arecapable of attracting states that are close to them. The fundamental memory (1, 1, 1) attracts unstablestates (1, 1, 1), (1, 1, 1) and (1, 1, 1). Each ofthese unstable states represents a single error,compared to the fundamental memory (1, 1, 1). The fundamental memory (1, 1, 1) attractsunstable states (1, 1, 1), (1, 1, 1) and (1, 1, 1). Thus, the Hopfield network can act as an errorcorrection network.
  67. 67. © Negnevitsky, Pearson Education, 2005 67Storage capacity of the Hopfield network Storage capacity is or the largest number offundamental memories that can be stored andretrieved correctly. The maximum number of fundamental memoriesMmax that can be stored in the n-neuron recurrentnetwork is limited bynMmax 0.15
  68. 68. © Negnevitsky, Pearson Education, 2005 68Bidirectional associative memory (BAM) The Hopfield network represents an autoassociativetype of memory  it can retrieve a corrupted orincomplete memory but cannot associate this memorywith another different memory. Human memory is essentially associative. One thingmay remind us of another, and that of another, and soon. We use a chain of mental associations to recovera lost memory. If we forget where we left an umbrella,we try to recall where we last had it, what we weredoing, and who we were talking to. We attempt toestablish a chain of associations, and thereby torestore a lost memory.
  69. 69. © Negnevitsky, Pearson Education, 2005 69 To associate one memory with another, we need arecurrent neural network capable of accepting aninput pattern on one set of neurons and producinga related, but different, output pattern on anotherset of neurons. Bidirectional associative memory (BAM), firstproposed by Bart Kosko, is a heteroassociativenetwork. It associates patterns from one set, set A,to patterns from another set, set B, and vice versa.Like a Hopfield network, the BAM can generaliseand also produce correct outputs despite corruptedor incomplete inputs.
  70. 70. © Negnevitsky, Pearson Education, 2005 70BAM operationyj(p)y1(p)y2(p)ym(p)12jmOutputlayerInputlayerxi(p)x1x2(p)(p)xn(p)2in1xi(p+1)x1(p+1)x2(p+1)xn(p+1)yj(p)y1(p)y2(p)ym(p)12jmOutputlayerInputlayer2in1(a) Forward direction. (b) Backward direction.
  71. 71. © Negnevitsky, Pearson Education, 2005 71The basic idea behind the BAM is to storepattern pairs so that when n-dimensional vectorX from set A is presented as input, the BAMrecalls m-dimensional vector Y from set B, butwhen Y is presented as input, the BAM recalls X.
  72. 72. © Negnevitsky, Pearson Education, 2005 72 To develop the BAM, we need to create acorrelation matrix for each pattern pair we want tostore. The correlation matrix is the matrix productof the input vector X, and the transpose of theoutput vector YT. The BAM weight matrix is thesum of all correlation matrices, that is,where M is the number of pattern pairs to be storedin the BAM.TmMmm YXW 1
  73. 73. © Negnevitsky, Pearson Education, 2005 73Stability and storage capacity of the BAM The BAM is unconditionally stable. This means thatany set of associations can be learned without risk ofinstability. The maximum number of associations to be stored inthe BAM should not exceed the number ofneurons in the smaller layer. The more serious problem with the BAM isincorrect convergence. The BAM may notalways produce the closest association. In fact, astable association may be only slightly related tothe initial input vector.