Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

Like this presentation? Why not share!

- Neural networks by Presentaionslive.... 1753 views
- PRIVACY PRESERVING BACK-PROPOGATION... by Thushara Maruthiyat 932 views
- Artificial Neural Networks on a Tic... by Eduardo Gulias Davis 1093 views
- Artificial intelligence by ReachLocal Servic... 1136 views
- 類神經網路、語意相似度(一個不嫌少、兩個恰恰好) by Ming-Chi Liu 982 views
- Neural Networks: Least Mean Square ... by Mostafa G. M. Mos... 162 views

No Downloads

Total views

1,245

On SlideShare

0

From Embeds

0

Number of Embeds

3

Shares

0

Downloads

98

Comments

0

Likes

3

No embeds

No notes for slide

- 1. Artificial Neural Networks
- 2. What is Neural Network ? <ul><li>Definition : An artificial neural network is a computer program that can recognize patterns in a given collection of data and produce a model for that data. It resembles the brain in two respects: </li></ul><ul><ul><li>Knowledge is acquired by the network through a learning process (trial and error). </li></ul></ul><ul><ul><li>inter-neuron connection strengths known as synaptic weights are used to store the knowledge. </li></ul></ul>
- 3. Demonstration <ul><li>Demonstration of a neural network used within an optical character recognition (OCR) application. </li></ul>
- 4. Neural Network Structure <ul><li>Artificial neuron </li></ul><ul><li>Network refers to the inter–connections between the neurons in the different layers of each system. </li></ul><ul><li>The most basic system has three layers. </li></ul><ul><li>The first layer has input neurons which send data via synapses to the second layer of neurons and then via more synapses to the third layer of output neurons. </li></ul><ul><li>The synapses store parameters called "weights" which are used to manipulate the data in the calculations. </li></ul>w 1 w 2 w n : Σ F(net) Input Output
- 5. <ul><li>Network Function f(x) is a composition of other functions g i (x) , which can further be defined as a composition of other functions. </li></ul><ul><li>This can be conveniently represented as a network structure, with arrows depicting the dependencies between variables. </li></ul>
- 6. <ul><li>This figure depicts a decomposition of “f”, with dependencies between variables indicated by arrows. These can be interpreted in two ways. </li></ul><ul><li>Functional view : the input is transformed into a 3-dimensional vector “h” , which is then transformed into a 2-dimensional vector “g” , which is finally transformed into “f” </li></ul><ul><li>Probabilistic view : the random variable F=f(G) depends upon the random variable G=g(H) , which depends upon H=h(X), which depends upon the random variable X. This view is most commonly encountered in the context of graphical models. </li></ul>
- 7. Why do we use Neural Networks? <ul><li>Ability to represent both linear and non-linear relationships </li></ul><ul><li>Their ability to learn these relationships directly from the data being modeled. </li></ul><ul><li>Although computing these days is truly advanced, there are certain tasks that a program made for a common microprocessor is unable to perform. </li></ul><ul><li>There are different architectures, which consequently requires different types of algorithms, but despite to be an apparently complex system, a neural network is relatively simple. </li></ul>
- 8. Advantages of ANN <ul><li>A neural network can perform tasks that a linear program can not. </li></ul><ul><li>When an element of the neural network fails, it can continue without any problem by their parallel nature. </li></ul><ul><li>A neural network learns and does not need to be reprogrammed. </li></ul><ul><li>It can be implemented in any application without any problem. </li></ul>
- 9. Disadvantages of ANN <ul><li>The neural network needs training to operate. </li></ul><ul><li>The architecture of a neural network is different from the architecture of microprocessors therefore needs to be emulated. </li></ul><ul><li>Requires high processing time for large neural networks. </li></ul>
- 10. How do Neural Networks Work? Input Output <ul><li>Train the Network </li></ul><ul><ul><li>Present the data to the network. </li></ul></ul><ul><ul><li>Network computes an output. </li></ul></ul><ul><ul><li>Network output compared to desired output. </li></ul></ul><ul><ul><li>Network weights are modified to reduce error. </li></ul></ul><ul><li>Use the Network </li></ul><ul><ul><li>Present new data to network. </li></ul></ul><ul><ul><li>Network computes an output based on its training </li></ul></ul>
- 11. Mathematical Model <ul><li>A set of major aspects of a parallel distributed model can be distinguished : </li></ul><ul><li>a set of processing units ('neurons,' 'cells'); </li></ul><ul><li>a state of activation y k for every unit, equivalent to the output. </li></ul><ul><li>connections between the units. </li></ul><ul><li>a propagation rule </li></ul><ul><li>an activation function Φ k </li></ul><ul><li>an external input (aka bias, offset) θ k for each unit; </li></ul><ul><li>a method for information gathering (the learning rule); </li></ul><ul><li>an environment within which the system must operate. </li></ul>
- 12. Contd… <ul><li>Connections between units </li></ul><ul><li>Assume that unit provides an additive contribution to the input of the unit which it is connected </li></ul><ul><li>The total input to unit k is simply the weighted sum of the separate outputs from each of the connected units plus a bias or offset term θ k </li></ul><ul><li>A positive w jk is considered excitation and negative w jk as inhibition. </li></ul><ul><li>The units of propagation rule be call sigma units </li></ul><ul><li>s k (t) = Σ w jk (t) y j (t)+ θ k </li></ul>
- 13. Contd… <ul><li>The activity of summing up of inputs is referred to as linear combination. </li></ul><ul><li>Finally, an activation function controls the amplitude of the output of the neuron. </li></ul><ul><li>An acceptable range of output is usually between 0 and 1, or -1 and 1. </li></ul><ul><li>The output of the neuron, y k , would therefore be the outcome of some activation function on the value of v k . </li></ul>
- 14. Contd… Mathematically, this process is described in this figure
- 15. Activation Functions <ul><li>There are three types of activation functions, denoted by Φ(.): </li></ul><ul><ul><li>Threshold Function: takes on a value of 0 if the summed input is less than a certain threshold value (v), and the value 1 if the summed input is greater than or equal to the threshold value. </li></ul></ul>
- 16. Contd… <ul><ul><li>Piecewise-Linear function: This function again can take on the values of 0 or 1, but can also take on values between that depending on the amplification factor in a certain region of linear operation. </li></ul></ul>
- 17. Contd… <ul><ul><li>Sigmoid function: This function can range between 0 and 1, but it is also sometimes useful to use the -1 to 1 range. An example of the sigmoid function is the hyperbolic tangent function. </li></ul></ul>
- 18. Training of Artificial Neural Networks <ul><li>A neural network has to be configured such that the application of a set of inputs produces (either ‘direct’ or via a relaxation process) the desired set of output. </li></ul><ul><li>One way is to set the weights explicitly, using a priori knowledge. </li></ul><ul><li>Other way is to ‘train’ the neural network by feeding it teaching patterns and letting it change its weights according to some learning rule. </li></ul>
- 19. Paradigms of Learning <ul><li>Supervised learning or Associative learning </li></ul><ul><li>In supervised learning, we are given a set of example pairs (x, y) {x € X , y € Y} and the aim is to find a function f: X->Y in the allowed class of functions that matches the examples. </li></ul><ul><li>We wish to infer the mapping implied by the data; the cost function is related to the mismatch between our mapping and the data and it implicitly contains prior knowledge about the problem domain. </li></ul>
- 20. <ul><li>A commonly used cost is the mean-squared error which tries to minimize the average squared error between the network's output, f(x), and the target value “y” over all the example pairs. </li></ul><ul><li>When one tries to minimize this cost using gradient descent for the class of neural networks called Multi-Layer Perceptrons, one obtains the common and well-known backpropagation algorithm for training neural networks. </li></ul>
- 21. Contd… <ul><li>Unsupervised learning or Self-organization </li></ul><ul><li>An (output) unit is trained to respond to clusters of pattern within the input. </li></ul><ul><li>In this paradigm the system is supposed to discover statistically salient features of the input population. </li></ul><ul><li>Unlike the supervised learning paradigm, there is no a priori set of categories into which the patterns are to be classified; rather the system must develop its own representation of the input stimuli. </li></ul>
- 22. <ul><li>we are given some data “x” and the cost function to be minimized, that can be any function of the data “x” and the network's output, “f”. </li></ul><ul><li>The cost function is dependent on the task (what we are trying to model) and our a priori assumptions (the implicit properties of our model, its parameters and the observed variables). </li></ul>
- 23. Modifying patterns of connectivity <ul><li>Hebbian learning rule </li></ul><ul><li>Suggested by Hebb in his classic book Organization of Behaviour (Hebb, 1949) </li></ul><ul><li>The basic idea is that if two units j and k are active simultaneously, their interconnection must be strengthened. If j receives input from k, the simplest version of Hebbian learning prescribes to modify the weight w jk with: </li></ul><ul><li>ϒ is a positive constant of proportionality representing the learning rate </li></ul>
- 24. Contd… <ul><li>Widrow-Hoff rule or the delta rule </li></ul><ul><li>Another common rule uses not the actual activation of unit k but the difference between the actual and desired activation for adjusting the weights. </li></ul><ul><li>d is the desired activation provided by a k teacher </li></ul>
- 25. THANK YOU

No public clipboards found for this slide

×
### Save the most important slides with Clipping

Clipping is a handy way to collect and organize the most important slides from a presentation. You can keep your great finds in clipboards organized around topics.

Be the first to comment