Published on


Published in: Education
No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide


  3. 4. Perceptron <ul><li>A single artificial neuron that computes its weighted input and uses a threshold activation function. </li></ul><ul><li>It is also called a TLU (Threshold Logic Unit) </li></ul><ul><li>It effectively separates the input space into two categories by the hyperplane : </li></ul><ul><ul><li>w T x + b i = 0 </li></ul></ul>
  4. 5. History of Artificial Neural Networks <ul><li>McCulloch and Pitts (1943): first neural network model </li></ul><ul><li>Hebb (1949): proposed a mechanism for learning, as increasing the synaptic weight between two neurons, by repeated activation of one neuron by the other across that synapse (lacked the inhibitory connection) </li></ul><ul><li>Rosenblatt (1958): Perceptron network and the associated learning rule </li></ul><ul><li>Widrow & Hoff (1960): a new learning algorithm for linear neural networks (ADALINE) </li></ul><ul><li>Minsky and Papert (1969): widely influential book about the limitations of single-layer perceptrons, causing the research on NNs mostly to come to an end. </li></ul><ul><li>Some that still went on: </li></ul><ul><ul><li>Anderson, Kohonen (1972): Use of ANNs as associative memory </li></ul></ul><ul><ul><li>Grossberg (1980): Adaptive Resonance Theory </li></ul></ul><ul><ul><li>Hopfield (1982): Hopfield Network </li></ul></ul><ul><ul><li>Kohonen (1982): Self-organizing maps </li></ul></ul><ul><li>Rumelhart and McClelland (1982): Backpropagation algorithm for training multilayer feed-forward networks . Started a resurgence on NN research again. </li></ul>
  5. 6. <ul><li>Error-correcting Learning. </li></ul><ul><li>Associative Learning. </li></ul>
  6. 7. Types of Learnin g • Supervised Learning Network is provided with a set of examples of proper network behavior (inputs/targets) • Reinforcement Learning Network is only provided with a grade, or score, which indicates network performance • Unsupervised Learning Only network inputs are available to the learning algorithm. Network learns to categorize (cluster) the inputs.
  7. 8. <ul><li>1. Perceotron </li></ul><ul><li>2. Delta Rule </li></ul><ul><li>3. Error – Backprobagation </li></ul>Error-correcting Learning.
  8. 9. Decision Boundary • All points on the decision boundary have the same inner product (= -b) with the weight vector • Therefore they have the same projection onto the weight vector ; so they must lie on a line orthogonal to the weight vector w T .p = ||w||||p||Cos  proj. of p onto w = ||p||Cos  = w T .p /||w||  p w proj. of p onto w
  9. 10. Two layers <ul><li>Binary nodes that takes values 0 or1 </li></ul><ul><li>Continuous weights , initially </li></ul>Chosen randomly
  10. 12. Input Layer — A vector of predictor variable values ( x1...xp ) is presented to the input layer. The input layer (or processing before the input layer) standardizes these values so that the range of each variable is -1 to 1. The input layer distributes the values to each of the neurons in the hidden layer. In addition to the predictor variables, there is a constant input of 1.0, called the bias that is fed to each of the hidden layers; the bias is multiplied by a weight and added to the sum going into the neuron.
  11. 13. Hidden Layer — Arriving at a neuron in the hidden layer, the value from each input neuron is multiplied by a weight ( wji ), and the resulting weighted values are added together producing a combined value uj . The weighted sum ( uj ) is fed into a transfer function, σ, which outputs a value hj . The outputs from the hidden layer are distributed to the output layer.
  12. 14. Output Layer Arriving at a neuron in the output layer, the value from each hidden layer neuron is multiplied by a weight ( wkj ), and the resulting weighted values are added together producing a combined value vj . The weighted sum ( vj ) is fed into a transfer function, σ, which outputs a value yk .
  13. 16. Learning Problem To Be Solved <ul><li>How could we adjust the weights, </li></ul><ul><li>so that this situation is remedied </li></ul><ul><li>And the spontaneous output </li></ul><ul><li>matches our target output pattern </li></ul><ul><li>(0)? </li></ul><ul><li>We have a net input of –0.1,Which </li></ul><ul><li>Gives an output pattern of (0) </li></ul><ul><li>We have a single input pattern(1) </li></ul><ul><li>Suppose we have an input pattern(0,1) </li></ul>
  14. 17.   Answer <ul><li>So we will leave it alone </li></ul><ul><li>  Observation: Weights from input node with activation 0 does not have any effect on the net input </li></ul><ul><li>  E.g.,add 0.2 to all weights </li></ul><ul><li>  Increase the weights,so that the net input exceeds 0.0 </li></ul>
  15. 18. Perceptron algorithm in words <ul><li>For each node in the output layer: </li></ul><ul><li>Calculate the error,which can onlytakestheValues1and1 </li></ul><ul><li>If the error is0,the goal has been achieved.Otherwise,we adjust the weights </li></ul><ul><li>Do not alter weights from inactivated input </li></ul><ul><li>Nodes </li></ul><ul><li>  Decrease the weight if the error was 1,increase It if the error was-1 </li></ul>
  16. 19. Perceptron algorithm in rules <ul><li>Weight change = some small constant * </li></ul><ul><li>(target activation-spontaneous output </li></ul><ul><li>Activation) * input activation </li></ul><ul><li>If speak of error instead of the “Target activation of minus the spontaneous output activation”,we have </li></ul><ul><li>Weight change = Some small constant * error * input activation </li></ul>
  17. 21. Perceptro Learning Rule ( Summary ) <ul><li>How do we find the weights using a learning procedure? </li></ul><ul><li>1 - Choose initial weights randomly </li></ul><ul><li>2 - Present a randomly chosen pattern x </li></ul><ul><li>3 - Update weights using Delta rule: </li></ul><ul><li>w ij (t+1) = w ij (t) + err i * x j </li></ul><ul><li>where err i = (target i - o utput i ) </li></ul><ul><li>4 - Repeat steps 2 and 3 until the stopping criterion (convergence, max number of iterations) is reached </li></ul>
  18. 22. Perceptron Convergence theorem <ul><li>If a pattern set can be expanded by a two layer perceptron,.. </li></ul><ul><li>The perceptron Learning rule will always </li></ul><ul><li>Be able to find some correct weights </li></ul>
  19. 23. Perceptron Limitations <ul><li>A single layer perceptron can only learn linearly separable problems. </li></ul><ul><ul><li>Boolean AND function is linearly separable, whereas Boolean X OR function (and the parity problem in general) is not . </li></ul></ul>
  20. 24. Linear Separability Boolean AND Boolean X OR
  21. 25. Perceptron Limitations Linear Decision Boundary Linearly Inseparable Problems
  22. 26. Apple/Banana Example - Self Study Training Set Random Initial Weights First Iteration e t 1 a – 1 0 – 1 = = =
  23. 29. The Perceptron was a Big Hit <ul><li>Spawned the first wave in “CONNECTIONISM” </li></ul><ul><li>Great interest and optimism about the </li></ul><ul><li>future of Neural networks </li></ul><ul><li>First Neural Network hardware was built in </li></ul><ul><li>The late fifties and early sixties </li></ul>
  25. 34. XOR problem XOR (exclusive OR) problem 0+0=0 1+1=2=0 mod 2 1+0=1 0+1=1 Perceptron does not work here Single layer generates a linear decision boundary
  26. 35. Minsky & Papert (1969) offered solution to XOR problem by combining perceptron unit responses using a second layer of units 1 2 +1 3 +1
  27. 36. x n x 1 x 2 Inputs x i Outputs y j Two-layer networks y 1 y m 2nd layer weights w ij from j to i 1st layer weights v ij from j to i Outputs of 1st layer z i
  28. 37. Multilayer Perceptron Architecture
  29. 38. Training Multilayer Perceptron Networks <ul><li>The goal of the training process is to find the set of weight values that will cause the output from the neural network to match the actual target values as closely as possible. There are several issues involved in designing and training a multilayer perceptron network: </li></ul><ul><li>Selecting how many hidden layers to use in the network. </li></ul><ul><li>Deciding how many neurons to use in each hidden layer. </li></ul><ul><li>Finding a globally optimal solution that avoids local minima. </li></ul><ul><li>Converging to an optimal solution in a reasonable period of time. </li></ul><ul><li>Validating the neural network to test for overfitting. </li></ul>
  32. 43. Cybernetics and brain simulation Main articles: Cybernetics and Computational neuroscience There is no consensus on how closely the brain should be simulated . In the 1940s and 1950s, a number of researchers explored the connection between neurology , information theory , and cybernetics . Some of them built machines that used electronic networks to exhibit rudimentary intelligence, such as W. Grey Walter 's turtles and the Johns Hopkins Beast . Many of these researchers gathered for meetings of the Teleological Society at Princeton University and the Ratio Club in England. [24] By 1960, this approach was largely abandoned, although elements of it would be revived in the 1980s.
  33. 45. General intelligence Main articles: Strong AI and AI-complete Most researchers hope that their work will eventually be incorporated into a machine with general intelligence (known as strong AI ), combining all the skills above and exceeding human abilities at most or all of them. [12] A few believe that anthropomorphic features like artificial consciousness or an artificial brain may be required for such a project. [74] Many of the problems above are considered AI-complete : to solve one problem, you must solve them all. For example, even a straightforward, specific task like machine translation requires that the machine follow the author's argument ( reason ), know what is being talked about ( knowledge ), and faithfully reproduce the author's intention ( social intelligence ). Machine translation , therefore, is believed to be AI-complete: it may require strong AI to be done as well as humans can do it. [75]
  34. 47. Some important conclusions from the work were as follows: Speech recognition has definite potential for reducing pilot workload, but this potential was not realized consistently. Achievement of very high recognition accuracy (95% or more) was the most critical factor for making the speech recognition system useful — with lower recognition rates, pilots would not use the system. More natural vocabulary and grammar, and shorter training times would be useful, but only if very high recognition rates could be maintained. Military High-performance fighter aircraft