- 2. Basic Neuron Model In A Feedforward Network • Inputs xi arrive through pre-synaptic connections • Synaptic efficacy is modeled using real weights wi • The response of the neuron is a nonlinear function f of its weighted inputs
- 4. Task Plot the following type of Neural activation functions. 1(a) Threshold Function φ(v)= +1 for v≥0 0 for v<0 1(b) Threshold Function φ(v)= +1 for v≥0 -1 otherwise 2 Piecewise linear Function φ(v)= 1 for v≥+1/2 v for +1/2>v>-1/2 0 for v≤-1/2 3(a) Sigmoid Function φ(v)=1/(1+ exp(-λv)) 3(b) Sigmoid Function φ(v)=2/(1+ exp(-λv)) 3(c) Sigmoid Function φ(v)=tanh(λv) For 3 vary the value of ‘λ’ and show the changes in the graph.
- 6. Single Layer Artificial Neural Networks
- 9. Banana & Apple Sorter
- 12. Illustration of a Neural Network
- 13. Different networks ☻Perceptron – Feedforward Network, Linear Decision Boundary, One Neuron for Each Decision ☻Hamming Network ☻Hopfield Network - Dynamic Associative Memory Network ☻Error Back Propagation network ☻Radial basis network ☻ART ☻Brain in a box neural network ☻Cellular neural Network ☻Neocognitron ☻Functional
- 15. 1970s The Backpropagation algorithm was first proposed by Paul Werbos in the 1970's. However, it was rediscoved in 1986 by Rumelhart and McClelland & became widely used. It took 30 years before the error backpropagation (or in short: backprop) algorithm popularized.
- 17. Differences In Networks Feedforward Networks • Solutions are known • Weights are learned • Evolves in the weight space • Used for: – Prediction – Classification – Function approximation Feedback Networks • Solutions are unknown • Weights are prescribed • Evolves in the state space • Used for: – Constraint satisfaction – Optimization – Feature matching
- 18. Architecture A Back Prop network has atleast 3 layers of units: an input layer, at least one intermediate hidden layer, & an output layer. Connection weights in a Back Prop network are one way. Units are connected in a feed- forward fashion with input units fully connected to units in the hidden layer & hidden units fully connected to units in the output layer. When a Back Prop network is cycled, an input pattern is propagated forward to the output units through the intervening input-to-hidden and hidden-to- output weights.
- 19. Inputs To Neurons • Arise from other neurons or from outside the network • Nodes whose inputs arise outside the network are called input nodes and simply copy values • An input may excite or inhibit the response of the neuron to which it is applied, depending upon the weight of the connection
- 21. Weights • Represent synaptic efficacy and may be excitatory or inhibitory • Normally, positive weights are considered as excitatory while negative weights are thought of as inhibitory • Learning is the process of modifying the weights in order to produce a network that performs some function
- 22. Finding net
- 23. Output • The response function is normally nonlinear • Samples include – Sigmoid – Piecewise linear x e xf λ− + = 1 1 )( < ≥ = θ θ xif xifx xf ,0 , )(
- 24. Back propagation Networks I1 I2 1 Hidden Layer H1 H2 O1 O2 Output Layer Wi,j Wj,k 1’s - bias ∑ + = − j jxj Hw e xO , 1 1 )( I3 1 ∑ + = − i ixi Iw e xH , 1 1 )(
- 25. Weight updation
- 26. Backpropagation Preparation • Training Set A collection of input-output patterns that are used to train the network • Testing Set A collection of input-output patterns that are used to assess network performance • Learning Rate-η A scalar parameter, analogous to step size in numerical integration, used to set the rate of adjustments
- 27. Learning • Learning occurs during a training phase in which each input pattern in a training set is applied to the input units and then propagated forward. • The pattern of activation arriving at the output layer is then compared with the correct output pattern to calculate an error signal. • The error signal for each such target output pattern is then back propagated from the outputs to the inputs in order to appropriately adjust the weights in each layer of the network.
- 28. Learning • The process goes on for several cycles till the error reduces to a predefined limit. • After a BackProp network has learned the correct classification for a set of inputs, it can be tested on a second set of inputs to see how well it classifies untrained patterns. • Thus, an important consideration in applying BackProp learning is how well the network generalizes.
- 29. The basic principles of the back propagation algorithm are: (1) the error of the output signal of a neuron is used to adjust its weights such that the error decreases, and (2) the error in hidden layers is estimated proportional to the weighted sum of the (estimated) errors in the layer above.
- 30. Patterns Training patterns (70%) Testing patterns (30%)
- 31. During the training, the data is presented to the network several thousand times. For each data sample, the current output of the network is calculated and compared to the "true" target value. The error signal dj of neuron j is computed from the difference between the target and the calculated output. For hidden neurons, this difference is estimated by the weighted error signals of the layer above. The error terms are then used to adjust the weights wij of the neural network.
- 32. A Pseudo-Code Algorithm • Randomly choose the initial weights • While error is too large – For each training pattern (presented in random order) • Apply the inputs to the network • Calculate the output for every neuron from the input layer, through the hidden layer(s), to the output layer • Calculate the error at the outputs • Use the output error to compute error signals for pre-output layers • Use the error signals to compute weight adjustments • Apply the weight adjustments – Periodically evaluate the network performance
- 33. Network Error • Total-Sum-Squared-Error (TSSE) • Root-Mean-Squared-Error (RMSE) ∑ ∑ −= patterns outputs actualdesiredTSSE 2 )( 2 1 outputspatterns TSSE RMSE *## *2 =
- 35. Apply Inputs From A Pattern • Apply the value of each input parameter to each input node • Input nodes computer only the identity function Feedforward Inputs Outputs
- 36. Calculate Outputs For Each Neuron Based On The Pattern • The output from neuron j for pattern p is Opj where and k ranges over the input indices and Wjk is the weight on the connection from input k to neuron j Feedforward Inputs Outputs jnetjpj e netO λ− + = 1 1 )( ∑+= k jkpkbiasj WOWbiasnet *
- 37. Calculate The Error Signal For Each Output Neuron • The output neuron error signal δpj is given by δpj=(Tpj-Opj) Opj (1-Opj) • Tpj is the target value of output neuron j for pattern p • Opj is the actual output value of output neuron j for pattern p
- 38. Calculate The Error Signal For Each Hidden Neuron • The hidden neuron error signal δpj is given by where δpk is the error signal of a post- synaptic neuron k and Wkj is the weight of the connection from hidden neuron j to the post-synaptic neuron k kj k pkpjpjpj WOO ∑−= δδ )1(
- 39. Calculate And Apply Weight Adjustments • Compute weight adjustments ∆Wji at time t by ∆Wji(t)= η δpj Opi • Apply weight adjustments according to Wji(t+1) = Wji(t) + ∆Wji(t) • Some add a momentum term α∗∆Wji(t-1)
- 40. • Thus, the network adjusts its weights after each data sample. This learning process is in fact a gradient descent in the error surface of the weight space - with all its drawbacks. The learning algorithm is slow and prone to getting stuck in a local minimum.
- 42. Simulation Issues How to Select Initial Weights Local Minima Solutions to Local minima Rate of Learning Stopping Criterion Initialization
- 43. • For the standard back propagation algorithm, the initial weights of the multi-layer perceptron have to be relatively small. They can, for instance, be selected randomly from a small interval around zero. During training they are slowly adapted. Starting with small weights is crucial, because large weights are rigid and cannot be changed quickly.
- 44. Sequential & Batch modes For a given training set ,back-propagation learning proceeds in two basic ways: 1. Sequential Mode 2. Batch Mode
- 45. Sequential mode • The sequential mode of back-propagation learning is also referred to as on-line, pattern or stochastic mode. • To be specific, consider an epoch consisting of N training ex. Arranged in the order (x(1),d(1)),…,(x(N),d(N)). • The first ex. pair (x(1),d(1))in the epoch is presented to the network,& the sequence of forward & backward computations described previously is performed, resulting in certain adjustments to the synaptic weights & bias level of the network. • The second ex. (x(N),d(N)) in the epoch is presented,& the sequence of forward & backward computations is repeated, resulting in the further adjustments to the synaptic weights & bias levels. This process is continued until the last example pair (x(N),d(N)) in the epoch is accounted for.
- 46. Batch Propagation • In this mode of back-propagation learning weight updating is performed after the presentation of all the training examples that constitute an epoch. • For a particular epoch, the cost function is the average squared error, reproduced here in composite form is defined as:- ξav = (1/2N )Σ Σ ej 2 (n) for n=1 to N for j € C
- 47. • Let N denote the total no. of patterns contained in the training set. The average squared error energy is obtained by summing ξ(n) over all n and then normalizing with respect to the set size N, as shown by :- • ξav = 1/N Σ ξ(n) for n=1 to N
- 48. Stopping Criteria • The back-propagation algorithm cannot be shown to converge . • To formulate a criterion, it is logically to think in terms of the unique properties of a local or global minimum. • The back-propagation algorithm is considered to have converged when the Euclidean norm of the gradient vector reaches a sufficient small gradient threshold. • The back-propagation algorithm is considered to have converged when the absolute rate of change in the average squared error pre epoch is sufficiently small. • The drawback of this convergence criterion is that, for successful trials, learning time may be long.
- 50. • The back-propagation algorithm makes adjustments by computing the derivative, or slope of the network error with respect to each neuron’s output. It attempts to minimize the overall error by descending this slope to the minimum value for every weight. It advances one step down the slope each epoch. If the network takes steps that are too large, it may pass the global minimum. If it takes steps that are small, it may settle on local minima, or take an inordinate amount of time to arrive at the global minimum. The ideal step size for a given problem requires detailed, high-order derivative analysis, a task not performed by the algorithm.
- 52. Minima • Local minima • Global minima
- 53. Local Minima For simple 2 layer networks (without a hidden layer), the error surface is bowl shaped and using gradient-descent to minimize error is not a problem; the network will always find an errorless solution (at the bottom of the bowl). Such errorless solutions are called global minima. However, extra hidden layer implies complex surfaces. Since some minima are deeper than others, it is possible that gradient descent may not find a global minima. Instead, the network may fall into local minima which represent suboptimal solutions.
- 54. • The algorithm cycles through the training samples as:- • Initialization • Presentation of training Examples • Forward Computation
- 55. Initialization • Assuming that no prior information is available, pick the synaptic weights and thresholds from a uniform distribution whose mean is zero & whose variance is chosen to make the standard deviation of the induced local fields of the neurons lie at the transition between the linear and saturated parts of the sigmoid activation function.
- 56. Presentation of training Examples Present the network with an epoch of training examples. For each example in the set order in same fashion, perform the sequence of forward and backward computation as described below.
- 57. Solutions to Local minima Usual solution : More hidden layers. Logic - Although additional hidden units increase the complexity of the error surface, the extra dimensionalilty increases the number of possible escape routes. Our solution – Tunneling
- 58. Rate of Learning If the learning rate η is very small, then the algorithm proceeds slowly, but accurately follows the path of steepest descent in weight space. If η is large, the algorithm may oscillate.
- 59. A simple method of effectively increasing the rate of learning is to modify the delta rule by including a momentum term: Δwji (n) = α Δwji (n-1) + η δj (n)yi (n) where α is a positive constant termed the momentum constant. This is called the generalized delta rule. The effect is that if the basic delta rule is consistently pushing a weight in the same direction, then it gradually gathers "momentum" in that direction.
- 61. An Example: Exclusive “OR” • Training set – ((0.1, 0.1), 0.1) – ((0.1, 0.9), 0.9) – ((0.9, 0.1), 0.9) – ((0.9, 0.9), 0.1) • Testing set – Use at least 121 pairs equally spaced on the unit square and plot the results – Omit the training set (if desired)
- 62. An Example (continued): Network Architectureinputs output(s)
- 63. An Example (continued): Network Architecture Sample input 0.1 0.9 Target output 0.9 1 1 1
- 64. Feedforward Network Training by Backpropagation: Process Summary • Select an architecture • Randomly initialize weights • While error is too large – Select training pattern and feedforward to find actual network output – Calculate errors and backpropagate error signals – Adjust weights • Evaluate performance using the test set
- 65. An Example (continued): Network Architecture Sample input 0.1 0.9 Actual output ??? 1 1 1 ?? ?? ?? ?? ?? ?? ?? ?? ?? Target output 0.9
- 66. Feedforward Network Training by Backpropagation: Process Summary • Select an architecture • Randomly initialize weights • While error is too large – Select training pattern and feedforward to find actual network output – Calculate errors and backpropagate error signals – Adjust weights • Evaluate performance using the test set
- 67. Backpropagation •Very powerful - can learn any function, given enough hidden units! With enough hidden units, we can generate any function. •Have the same problems of Generalization vs. Memorization. With too many units, we will tend to memorize the input and not generalize well. Some schemes exist to “prune” the neural network.
- 68. BackProp networks are not limited in its use because they can adapt their weights to acquire new knowledge. BackProp networks learn by example, and can be used to make predictions.
- 69. Write a program to train and simulate neural network for following network – Input Nodes = 2 & Output Nodes = 1 – Input Nodes = 3 and Output nodes = 1 Inputs Outputs A B Y 0 0 0 0 1 1 1 0 1 1 1 0 Inputs Outputs A B C Y 0 0 0 0 0 0 1 0 0 1 0 0 0 1 1 0 1 0 0 1 1 0 1 1 1 1 0 1 1 1 1 1
- 70. • Artificial Neural Network – Simon Haykin • Artificial Neural Network – Jacek Zurada