Published on

Published in: Education
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide


  1. 1. Artificial Neural Networks
  2. 2. What is Neural Network ? <ul><li>Definition : An artificial neural network is a computer program that can recognize patterns in a given collection of data and produce a model for that data. It resembles the brain in two respects: </li></ul><ul><ul><li>Knowledge is acquired by the network through a learning process (trial and error). </li></ul></ul><ul><ul><li>inter-neuron connection strengths known as synaptic weights are used to store the knowledge. </li></ul></ul>
  3. 3. Demonstration <ul><li>Demonstration of a neural network used within an optical character recognition (OCR) application. </li></ul>
  4. 4. Neural Network Structure <ul><li>Artificial neuron </li></ul><ul><li>Network refers to the inter–connections between the neurons in the different layers of each system. </li></ul><ul><li>The most basic system has three layers. </li></ul><ul><li>The first layer has input neurons which send data via synapses to the second layer of neurons and then via more synapses to the third layer of output neurons. </li></ul><ul><li>The synapses store parameters called &quot;weights&quot; which are used to manipulate the data in the calculations. </li></ul>w 1 w 2 w n : Σ F(net) Input Output
  5. 5. <ul><li>Network Function f(x) is a composition of other functions g i (x) , which can further be defined as a composition of other functions. </li></ul><ul><li>This can be conveniently represented as a network structure, with arrows depicting the dependencies between variables. </li></ul>
  6. 6. <ul><li>This figure depicts a decomposition of “f”, with dependencies between variables indicated by arrows. These can be interpreted in two ways. </li></ul><ul><li>Functional view : the input is transformed into a 3-dimensional vector “h” , which is then transformed into a 2-dimensional vector “g” , which is finally transformed into “f” </li></ul><ul><li>Probabilistic view : the random variable F=f(G) depends upon the random variable G=g(H) , which depends upon H=h(X), which depends upon the random variable X. This view is most commonly encountered in the context of graphical models. </li></ul>
  7. 7. Why do we use Neural Networks? <ul><li>Ability to represent both linear and non-linear relationships </li></ul><ul><li>Their ability to learn these relationships directly from the data being modeled. </li></ul><ul><li>Although computing these days is truly advanced, there are certain tasks that a program made for a common microprocessor is unable to perform. </li></ul><ul><li>There are different architectures, which consequently requires different types of algorithms, but despite to be an apparently complex system, a neural network is relatively simple. </li></ul>
  8. 8. Advantages of ANN <ul><li>A neural network can perform tasks that a linear program can not. </li></ul><ul><li>When an element of the neural network fails, it can continue without any problem by their parallel nature. </li></ul><ul><li>A neural network learns and does not need to be reprogrammed. </li></ul><ul><li>It can be implemented in any application without any problem. </li></ul>
  9. 9. Disadvantages of ANN <ul><li>The neural network needs training to operate. </li></ul><ul><li>The architecture of a neural network is different from the architecture of microprocessors therefore needs to be emulated. </li></ul><ul><li>Requires high processing time for large neural networks. </li></ul>
  10. 10. How do Neural Networks Work? Input Output <ul><li>Train the Network </li></ul><ul><ul><li>Present the data to the network. </li></ul></ul><ul><ul><li>Network computes an output. </li></ul></ul><ul><ul><li>Network output compared to desired output. </li></ul></ul><ul><ul><li>Network weights are modified to reduce error. </li></ul></ul><ul><li>Use the Network </li></ul><ul><ul><li>Present new data to network. </li></ul></ul><ul><ul><li>Network computes an output based on its training </li></ul></ul>
  11. 11. Mathematical Model <ul><li>A set of major aspects of a parallel distributed model can be distinguished : </li></ul><ul><li>a set of processing units ('neurons,' 'cells'); </li></ul><ul><li>a state of activation y k for every unit, equivalent to the output. </li></ul><ul><li>connections between the units. </li></ul><ul><li>a propagation rule </li></ul><ul><li>an activation function Φ k </li></ul><ul><li>an external input (aka bias, offset) θ k for each unit; </li></ul><ul><li>a method for information gathering (the learning rule); </li></ul><ul><li>an environment within which the system must operate. </li></ul>
  12. 12. Contd… <ul><li>Connections between units </li></ul><ul><li>Assume that unit provides an additive contribution to the input of the unit which it is connected </li></ul><ul><li>The total input to unit k is simply the weighted sum of the separate outputs from each of the connected units plus a bias or offset term θ k </li></ul><ul><li>A positive w jk is considered excitation and negative w jk as inhibition. </li></ul><ul><li>The units of propagation rule be call sigma units </li></ul><ul><li>s k (t) = Σ w jk (t) y j (t)+ θ k </li></ul>
  13. 13. Contd… <ul><li>The activity of summing up of inputs is referred to as linear combination. </li></ul><ul><li>Finally, an activation function controls the amplitude of the output of the neuron. </li></ul><ul><li>An acceptable range of output is usually between 0 and 1, or -1 and 1. </li></ul><ul><li>The output of the neuron, y k , would therefore be the outcome of some activation function on the value of v k . </li></ul>
  14. 14. Contd… Mathematically, this process is described in this figure
  15. 15. Activation Functions <ul><li>There are three types of activation functions, denoted by Φ(.): </li></ul><ul><ul><li>Threshold Function: takes on a value of 0 if the summed input is less than a certain threshold value (v), and the value 1 if the summed input is greater than or equal to the threshold value. </li></ul></ul>
  16. 16. Contd… <ul><ul><li>Piecewise-Linear function: This function again can take on the values of 0 or 1, but can also take on values between that depending on the amplification factor in a certain region of linear operation. </li></ul></ul>
  17. 17. Contd… <ul><ul><li>Sigmoid function: This function can range between 0 and 1, but it is also sometimes useful to use the -1 to 1 range. An example of the sigmoid function is the hyperbolic tangent function. </li></ul></ul>
  18. 18. Training of Artificial Neural Networks <ul><li>A neural network has to be configured such that the application of a set of inputs produces (either ‘direct’ or via a relaxation process) the desired set of output. </li></ul><ul><li>One way is to set the weights explicitly, using a priori knowledge. </li></ul><ul><li>Other way is to ‘train’ the neural network by feeding it teaching patterns and letting it change its weights according to some learning rule. </li></ul>
  19. 19. Paradigms of Learning <ul><li>Supervised learning or Associative learning </li></ul><ul><li>In supervised learning, we are given a set of example pairs (x, y) {x € X , y € Y} and the aim is to find a function f: X->Y in the allowed class of functions that matches the examples. </li></ul><ul><li>We wish to infer the mapping implied by the data; the cost function is related to the mismatch between our mapping and the data and it implicitly contains prior knowledge about the problem domain. </li></ul>
  20. 20. <ul><li>A commonly used cost is the mean-squared error which tries to minimize the average squared error between the network's output, f(x), and the target value “y” over all the example pairs. </li></ul><ul><li>When one tries to minimize this cost using gradient descent for the class of neural networks called Multi-Layer Perceptrons, one obtains the common and well-known backpropagation algorithm for training neural networks. </li></ul>
  21. 21. Contd… <ul><li>Unsupervised learning or Self-organization </li></ul><ul><li>An (output) unit is trained to respond to clusters of pattern within the input. </li></ul><ul><li>In this paradigm the system is supposed to discover statistically salient features of the input population. </li></ul><ul><li>Unlike the supervised learning paradigm, there is no a priori set of categories into which the patterns are to be classified; rather the system must develop its own representation of the input stimuli. </li></ul>
  22. 22. <ul><li>we are given some data “x” and the cost function to be minimized, that can be any function of the data “x” and the network's output, “f”. </li></ul><ul><li>The cost function is dependent on the task (what we are trying to model) and our a priori assumptions (the implicit properties of our model, its parameters and the observed variables). </li></ul>
  23. 23. Modifying patterns of connectivity <ul><li>Hebbian learning rule </li></ul><ul><li>Suggested by Hebb in his classic book Organization of Behaviour (Hebb, 1949) </li></ul><ul><li>The basic idea is that if two units j and k are active simultaneously, their interconnection must be strengthened. If j receives input from k, the simplest version of Hebbian learning prescribes to modify the weight w jk with: </li></ul><ul><li>ϒ is a positive constant of proportionality representing the learning rate </li></ul>
  24. 24. Contd… <ul><li>Widrow-Hoff rule or the delta rule </li></ul><ul><li>Another common rule uses not the actual activation of unit k but the difference between the actual and desired activation for adjusting the weights. </li></ul><ul><li>d is the desired activation provided by a k teacher </li></ul>
  25. 25. THANK YOU