Course notes (.ppt)

1,116 views

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,116
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
40
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Course notes (.ppt)

  1. 1. Chapter 19: Neural Networks • Neuroanatomy • Computing in neural nets • Network architectures – Perceptron, Hopfield, recurrent, multi-layer, feed forward • Functions that networks can represent – Linear vs. non-linear separable functions • Training multilayer feed forward networks – Backpropagation
  2. 2. Neuroanatomy • Neural networks (aka connectionist, PDP, artificial neural networks) – Rough approximation to animal nervous system – See systems such as NEURON for modeling at more biological levels of detail; http://neuron.duke.edu/ • Neuron components – Soma (cell body); dendritic tree – Axon: sends signal downstream – Synapses • Receive incoming signals from upstream neurons • Connections on dendrites, cell body, axon, synapses • Neurotransmitter mechanisms • Process: 1) synapse receives signals, change potential of cell body 2) when a limit is reached, neuron “fires”, electrical signal (action potential) sent down axon
  3. 3. What is represented by a neuron? Neural Nets & Neurons • Cell body sums electrical potentials from incoming signals – Serves as an accumulator function over time – But “as a rule many impulses must reach a neuron almost simultaneously to make it fire” (p. 33, Brodal, 1992; italics added) • Synapses have varying effects on cell potential – Idea of synaptic strength; influence of synapse on downstream neuron • Rough approximation in ANN models – No direct model of accumulator function – Synaptic strength: approx. with connection weights – Spiking of output approx. with non-linear activation functions
  4. 4. Summed Weighted Input ∑ == j iijiji aWaWin , ijW , Weight (scalar) connecting unit j to unit i Unit j is in “left” layer; Unit i is in “right” layer ja Activation of value (scalar) of unit j; this is in layer to the “left” iW Row vector of incoming weights for unit i ia Column vector of activation values of units connected to unit i
  5. 5. Example of Summed Weighted Input Computation 1 Unit number within layer 2 3 4 1 W1,1 W2,1 W3,1 W4,1 [ ]1,41,31,21,11 WWWWW = [ ]             = 4 3 2 1 1,41,31,21,111 a a a a WWWWaW             = 4 3 2 1 1 a a a a a Recall: multiplying a n×r with a r×m matrix produces an n×m matrix, C, where each element in that n×m matrix Ci,j is produced as the scalar product of row i of the left and column j of the right
  6. 6. Scalar Result: Summed Weighted Input [ ] 41,431,321,211,1 4 3 2 1 1,41,31,21,111 aWaWaWaW a a a a WWWWaW +++=             = 1×4 row vector 4×1 column vector 1×1 matrix (scalar)
  7. 7. Computing New Activation Value )( 41,431,321,211,1 aWaWaWaWgvaluenew +++=− Where: g(x) is the activation function, e.g., the sigmoid function )( iiaWgvaluenew =− )( 11aWgvaluenew =− For the case we were considering: In the general case:
  8. 8. Weight Matrix • Row vector provides weights for a single unit in “right” layer • A weight matrix provides all weights connecting “left” layer to “right” layer • Let W be a n×r weight matrix – Row vector i in matrix connects unit i on “right” layer to units in “left” layer – n units in layer to “right” – r units in layer to “left” ia Still the vector of activation values of layer to “left”; an r×1 column vector iaW n×1 column vector; summed weights for “right” layer )( iaWg n×1 - New activation values for “right” layer
  9. 9. Example ) 75. 4.0 23 02 11.1 34 0.31 (                       − −g Updating hidden layer activation values ) 1 3 3 2 1. 6.3310 56471. 4.1322 (                           g Updating output activation values Draw the architecture (units and arcs representing weights) of the connectionist model
  10. 10. Answer • 2 input units • 5 hidden layer units • 3 output units • Fully connected, feedforward network
  11. 11. Thresholds • Can provide this for g(x) activation function • Point along the x axis where the unit “fires” • Instead of threshold constant, can use an input per unit (constant activation, -1) – With the corresponding extra weight for the unit w1 w2 wn … t w1 w2 wn … wt Unit with threshold Unit with threshold -1
  12. 12. Network Architectures• Perceptron – Single layer of weights; typically single output unit • No hidden units • Multi-layer – One or more layers of hidden units • Hopfield network – Associative “memory”; single layer of units; each unit connected to every other unit; symmetric weights; particular training alg. • Recurrent networks – These are not strictly feedforward networks – There are cycles in the network graph; Why? – Two ideas of “memory” • Heterogenous architectures – E.g., some layers with recurrent connections • Other layers with strictly feedforward connections
  13. 13. Linear vs. Non-linear Separable Functions • Separability – E.g., say the network has two inputs: x, y and one output and output is 1 or 0 • Plot these as cartesian points on a graph • Question: can you draw a line (ax + b = y) that separates out the “1”s from the “0”s?
  14. 14. Backpropagation: A Method of Establishing Weights in an ANN 1) Using our standard formulas for computing activations, compute all of the network activations - “forward propagation” 2) Compare the output vector produced by the network to the “correct” output - if no difference, then do nothing - if difference, compute errors for units: Erri = Ti - Oi Where: Ti = correct output; Oi = current output 3) Adjust weights to reduce error: Assess blame for error, and divide amongst weights
  15. 15. Backprop - 2 Weight update rule for output layer Wj,i = the weight connecting unit j to unit i α = is the learning rate parameter (a real number) aj = activation of unit in layer to “left” g’(ini) = 1st derivative of g (slope) Erri = is a representation of error If (a) the learning rate is large, (b) activation of j node is large, the weight change will be more, or (c) error is greater, this will change weight more iijijij ErringaWW ⋅⋅⋅+← )(',, α i Wj,i Oi aj …
  16. 16. Backprop – 3 g(x) = 1/(1 + e-x ) g’(x) = x(1-x) g'(x) -120 -100 -80 -60 -40 -20 0 20 -10 -8.5 -7 -5.5 -4 -2.5 -1 0.5 2 3.5 5 6.5 8 9.5 g'(x) As x becomes larger in magnitude, g’(x) will contribute more towards the change in the weight i.e., addressing question: How effective is a change in the weight in the current “context”; current input sample
  17. 17. Backprop – 4: Error term for output layer unit Define: Erri = Ti – Oi What should we do for Erri? Ti Have an expected or target output value iijijij ErringaWW ⋅⋅⋅+← )(',, α i Wj,i Oi aj … Hidden layer Output layer
  18. 18. Backprop – 5: Error term for hidden layer unit (error backpropagation) No expected or target output value iijijij ErringaWW ⋅⋅⋅+← )(',, α i i iijj ErringWErr ⋅⋅= ∑ )(, j g’(ini)Erri … Hidden layer Output layer Wj,i

×