• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
20120140503023
 

20120140503023

on

  • 142 views

 

Statistics

Views

Total Views
142
Views on SlideShare
142
Embed Views
0

Actions

Likes
0
Downloads
0
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    20120140503023 20120140503023 Document Transcript

    • International Journal of Advanced Research in Engineering and Technology (IJARET), ISSN 0976 – 6480(Print), ISSN 0976 – 6499(Online) Volume 5, Issue 3, March (2014), pp. 208-215, © IAEME 208 SUPERVISED LEARNING IN ARTIFICIAL NEURAL NETWORKS Dr. Rajeshwari S. Mathad Department of Electronics Basaveshwar Science College, BAGALKOT, India ABSTRACT This research review article is about Artificial Neural network and about its working principle and its optimization. Artificial neural network is basically derived from the biological neural network, the key factor in the whole ANN is its weight upgrading capacity which also employees back propagation technique to improve its efficiency. I have described two neural network tools of Matlab that are fitting tool and pattern recognition tool from the neural network tool box both of which come under supervised learning technique. I have also described about the optimization of the neural network and also different factors involved in its optimization. Keywords: ANN-Artificial Neural Network. I. INTRODUCTION The nerve cell, or neuron, has four general regions, each defined by its physical position in the cell as well as its function .The cell body has two types of interconnection structures which emerge from it: dendrites and the axon. Each neuron generally has only one axon, but typically has many dendrites. The axon carries the nerve signal away from the cell body to other neurons. Dendrites carry signals in towards the cell body from the axons of other neurons. As such, the basic nerve cell can be thought of as a “black box” which has a number of inputs (dendrites) and only one output (the axon). A substance called a neurotransmitter flows across the gap from one neuron to another, thereby acting as a chemical “bridge” for the neural signal. This gap is consequently known as chemical synapse. This neurotransmitter material consists of sodium and potassium ions which intern help in creating the depolarization across the cell membrane near synapse. The depolarized region occurs over a small portion of the cell membrane, usually along the axon and this drastic change in polarization, known as the action potential. The action potential propagates down the axon like an electrical pulse radiating out from its source. And thus the cell body acts as a summing amplifier which multiplies an attenuating factor and then sends out the final output. Similar to above discussion the artificial neural networks consists of nodes which perform the same function as that of a neuron in a nerve cell and its output and efficiency can be varied by INTERNATIONAL JOURNAL OF ADVANCED RESEARCH IN ENGINEERING AND TECHNOLOGY (IJARET) ISSN 0976 - 6480 (Print) ISSN 0976 - 6499 (Online) Volume 5, Issue 3, March (2014), pp. 208-215 © IAEME: www.iaeme.com/ijaret.asp Journal Impact Factor (2014): 7.8273 (Calculated by GISI) www.jifactor.com IJARET © I A E M E
    • International Journal of Advanced Research in Engineering and Technology (IJARET), ISSN 0976 – 6480(Print), ISSN 0976 – 6499(Online) Volume 5, Issue 3, March (2014), pp. 208-215, © IAEME 209 updating its weights with respect to its inputs and outputs .The weights on the nodes play a key role because every single node is identical and therefore they can be differentiated only by their weights. Usually the weights and biases are initialized randomly, so as not to impose a prejudice on the network. Therefore, we must have some facility for changing the weights and biases in a systematic fashion. This is referred to as a learning rule, because the process of updating the weights and biases is thought of as training the network. [1-6] i) BASIC PRINCIPLE OF WEIGHTS Figure 1: Each input X is attenuated or amplified by a constant weight W Each input X is attenuated or amplified by a constant weight W, which relates to the physical attenuation on a synapse. Now this in case of a single node. But in case of multiple nodes a bias which controls the final output. i.e the model uses some sort of nonlinearity to capture the thresholding of the bias. The hard limit function, referred as the Heaviside function is used. This function is +1 when its input is greater than zero and 0 when its input is less than or equal to zero. But it is required to encompass a definite threshold value, than simply above or below zero. But a bias b can add to the function inside the hard-limit to create its discontinuity along the real axis. By summarising all together, the formula obtained for the output of this model neuron is: O = fnl (Σxiwi + b) Where fnl is a non-linear transfer function, e.g. the hard-limit function. The neurons are generally arranged in parallel, to form layers. Each neuron in a layer is having a same number of inputs, but it is equal to the total number of inputs to that layer. Therefore, every layer of p neurons have a total of n inputs and a total of p outputs, but each neuron has exactly one output. This conveys that each neuron has n inputs. But all the neurons in a layer have the same nonlinear transfer function. Figure-2: Block diagram of neural model
    • International Journal of Advanced Research in Engineering and Technology (IJARET), ISSN 0976 – 6480(Print), ISSN 0976 – 6499(Online) Volume 5, Issue 3, March (2014), pp. 208-215, © IAEME 210 Every neuron arranged in a layer in this manner are referred as a perceptron. Inputs to the layer are applied in parallel to all neural inputs simultaneously. For each neuron k in [1, p], each input xi is multiplied by the weight wi, for i in [1, n], and then summed. Each sum is then added to its bias bk and allowed to passed through the nonlinear function. At the conclusion of the process, the layer outputs appear in parallel. This process can be explained mathematically as if the inputs are presented as a row vector, the layer outputs can be found by the matrix product X W + B, which results in a (p x1) column vector. If the nonlinear transfer function is applied to each of the elements, we get the outputs to the neural layer. Perceptrons are trained in a supervised learning fashion. Also perceptrons are generally arranged in a strictly feed forward fashion. There are usually no closed loops, or cycles, either. The learning rule is then applied to the layer. A simple learning rule which is widely used is called as Widrow-Hoff rule: ∆ =d(t) y(t) wi(t 1) = wi (t) h∆xi (t) d(t) = +1, if input from class A d(t) = 0, if input from class B where 0 ≤h ≤ 1, a positive gain function. Class A is defined to be when the output is 0 and should be 1; Class B is defined to be when the output is 1 and should be 0. The above said rule specifies a simple method of updating each weight. This tries to minimize the error between the target output and the experimentally obtained output for each neuron by calculating that error and termed as “delta.” Now each weight is adjusted by adding to it its delta, multiplied by attenuation constant, usually about 10%. This process is iterated until the total error falls below some threshold. By adding its specific error to each of the weights, it is ensured that the network is being moved towards the position of minimum error. By using an attenuation constant, than the full value of the error, we move it slowly towards this position of minimum error. Each neural layer it means, each perceptron, is able to divide a space linearly. Picturing this process in two dimensions will be drawing a straight line across the Cartesian grid, and in three dimensions, like slicing a cube in two pieces along any arbitrary plane. Higher dimensions are partitioned in similar way, but are impossible to visualize. However, if we cascade a number of such processes, each succeeding layer can perform another linear partitioning, but it may be along a different (hyper) plane. Drawing one set of lines on the grid gives a binary partitioning: A-or-not-A. But if we take this already partitioned space and further partition it, we can obtain further refinement of the specification region. Then if we take that region and partition it once again, we get an even more refined region. In fact, we can show, just as informally, that the only three layers are necessary to define any region in n-space. The first layer allows the drawing of “lines”. The second layer allows the combining of these lines into “convex hulls”. The third layer allows the convex hulls to be combined into arbitrary regions. So one can construct a neural network of three layers of cascaded perceptrons. This is called as the Multi-layer Perceptron or MLP.[1]
    • International Journal of Advanced Research in Engineering and Technology (IJARET), ISSN 0976 – 6480(Print), ISSN 0976 – 6499(Online) Volume 5, Issue 3, March (2014), pp. 208-215, © IAEME 211 Figure-3: Three layered the Multi-layer Perceptron ii) BACK PROPAGATION PRINCIPLE The second principle used is the back propagation principle. The back propagation learning algorithm is divided into two phases: propagation and weight update. Phase 1: Propagation Each propagation involves the following steps: 1. Forward propagation of a training pattern's input through the neural network in order to generate the propagation's output activations. 2. Backward propagation of the propagation's output activations through the neural network using the training pattern target in order to generate the deltas of all output and hidden neurons. Phase 2: Weight update Each weight-synapse follow the following steps: 1. Multiply its output delta and input activation to get the gradient of the weight. 2. Subtract a ratio that is percentage of the gradient from the weight. This ratio or the percentage influences the speed and quality of learning; it is called the learning rate. The greater the ratio, the faster the neuron trains; the lower the ratio, the more accurate the training is. The sign of the gradient of a weight indicates where the error is increasing, this is why the weight must be updated in the opposite direction. Repeat phase 1 and 2 until the performance of the network is satisfactory. II. METHODOLOGY Basically the artificial neural net works are used as classifiers for the purpose of artificial intelligence.
    • International Journal of Advanced Research in Engineering and Technology (IJARET), ISSN 0976 6480(Print), ISSN 0976 – 6499(Online) Volume 5, Issue Generally one can implement ANN in M 1) Curve Fitting tool 2) Pattern Recognition 3) Clustering 4) Time series There are three types of learning procedures, those are 1) Supervised: In supervised learning, we are given a set of example pairs find a function in the allowed class of functions that matches the e words, it is to infer the mapping implied by the data; mismatch between our mapping and the data and it implicitly contains prior knowledge about the problem domain. 2) Unsupervised: In unsupervised learning, some data be any function of the data 3) Reinforcement : In reinforcement learning, data the environment. At each point in generates an observation unknown dynamics. The aim is to discover a measure of a long-term cost, it means dynamics and long-term cost for each policy are usually unknown, but can be estimate In this article we are considering only the analysis of the first two among the Matlab tool box In case of the supervised learning method respective output to the output. This relies on the fact that a given neuron needs to be aware of the state of all inputs to the net, as well as all the outputs of should be reinforced or suppressed. However, in the three both pieces of information, first layer has the input information, but no output information opposite is true for the output layer. And the middle lay roots in the fact that the hard-limit transfer function provides no information to the output of a neuron regarding its inputs. As such, it is useless for our purpose can make use of other nonlinear functions, as mentioned previously, which have transfer functions similar to the hard-limit function, but not its problem functions, we are able to obtain information about the state of the inputs in is possible to be able to correct the weight about both the outputs and the inputs. Once we update the final layer, we can back error at third layer to the previous layer. Then it is and compute its error. That error is then propagated back to the layer before it. This process can continue for as many layers as necessary. The learning rule described here is known as the “generalized delta rule” because the error is a “delta” which is propagated back to previous layers. International Journal of Advanced Research in Engineering and Technology (IJARET), ISSN 0976 6499(Online) Volume 5, Issue 3, March (2014), pp. 208-215, © IAEME 212 ANN in Matlab in four different formats those are, e types of learning procedures, those are, , we are given a set of example pairs in the allowed class of functions that matches the examples. In other the mapping implied by the data; and the cost function is related to the mismatch between our mapping and the data and it implicitly contains prior knowledge about some data is taken and the cost function to be minimized, that can and the network's output, . , data are not given, but generated by an agent's interactions with the environment. At each point in time , the agent performs an action and an instantaneous cost , according to some, dynamics. The aim is to discover a policy for selecting actions that minimizes some cost, it means the expected cumulative cost. Th term cost for each policy are usually unknown, but can be estimate In this article we are considering only the analysis of Supervised learning techniques th tool box tools i.e. Fitting tool and pattern recognition tool. he supervised learning method first supply the input data to the input and the . This relies on the fact that a given neuron needs to be aware of the state of all inputs to the net, as well as all the outputs of the net, in order to determine if should be reinforced or suppressed. However, in the three-layer arrangement described, first layer has the input information, but no output information rue for the output layer. And the middle layer has no information. This dilemma finds its limit transfer function provides no information to the output of a neuron regarding its inputs. As such, it is useless for our purposes. To get around this problem, one make use of other nonlinear functions, as mentioned previously, which have transfer functions limit function, but not its problem-causing discontinuity. tain information about the state of the inputs in layer layers. Therefore, it able to correct the weights in the final layer, because now one can about both the outputs and the inputs. Once we update the final layer, we can back the previous layer. Then it is able to update the weights at the previous layer, is then propagated back to the layer before it. This process can continue for as many layers as necessary. The learning rule described here is known as the because the error is a “delta” which is propagated back to previous layers. International Journal of Advanced Research in Engineering and Technology (IJARET), ISSN 0976 – , © IAEME and the aim is to xamples. In other the cost function is related to the mismatch between our mapping and the data and it implicitly contains prior knowledge about and the cost function to be minimized, that can not given, but generated by an agent's interactions with and environment , according to some, usually for selecting actions that minimizes some the expected cumulative cost. The environment's term cost for each policy are usually unknown, but can be estimated. Supervised learning techniques that are tools i.e. Fitting tool and pattern recognition tool. first supply the input data to the input and the . This relies on the fact that a given neuron needs to be aware of the net, in order to determine if weight layer arrangement described, no layer has first layer has the input information, but no output information — the . This dilemma finds its limit transfer function provides no information to the output of a . To get around this problem, one make use of other nonlinear functions, as mentioned previously, which have transfer functions By using other layer layers. Therefore, it have information about both the outputs and the inputs. Once we update the final layer, we can back propagate the able to update the weights at the previous layer, is then propagated back to the layer before it. This process can continue for as many layers as necessary. The learning rule described here is known as the because the error is a “delta” which is propagated back to previous layers.
    • International Journal of Advanced Research in Engineering and Technology (IJARET), ISSN 0976 – 6480(Print), ISSN 0976 – 6499(Online) Volume 5, Issue 3, March (2014), pp. 208-215, © IAEME 213 The algorithm is as follows. Starting from the output and working backwards, calculate the following: wij (t 1) =wij (t) hd pjopj where wij (t) represents the weights from node i to node j at time t, h is a gain term, and dpj is an error term for pattern p on node j. For output units dpj =kopj (1opj )(tpj opj ) For hidden units dpj =kopj (1opj ) k ∑ d pkwjk where the sum is over the k nodes in the layer above node j. In case of Matlab tools we consider the fitting tool and pattern recognition tool. 1) Fitting tool: As the name suggests the fitting tool basically helps us in deriving a linear equation between our inputs and outputs and this is done b the process of regression and it also helps us in determining the error percentage and the size of error through its various graphs like error histogram plot , training plot , region of convergence plot (ROC) and also standard deviation plot .The main backdrop of this tool is that for a data with higher ambiguity it doesn’t provide fruitful results because here we are restricted to only the first degree curve . 2) Pattern recognition tool: This tool helps in recognising the pattern of our data and it trains the network according to it the main advantage of this tool is that, it can occupy a higher order of ambiguity in the data and its outputs can be extracted in both forms it in the form of a fitting tool and in the form of total no of hits and misses it incorporates all the plots which the fitting tool provides plus with plot of confusion matrix which helps us in knowing the efficiency of the network with respect to every single class that we ought to classify the data.[7] III. OPTIMIZATION Before knowing about how to optimize our result we must know the need for optimisation. The optimization is required ,for a specific kind of data, where in our maximum possible efficiency would be 75% and if we further proceed to train our network again and again it goes into an unstable state and its efficiency starts decreasing .Thus we must keep in mind that for our network to perform better we need to have a specific and optimized number of iterations , specific number of neurons and specific number of trains because it mainly depends on the ambiguity in the data which we are giving to our network i.e. if we have high error percentage we prefer to increase number of neurons to get an optimum result and even after that we have the same error percent we then train our network more number of time lines in order to get its weights properly balanced . Same things apply for number of iterations i.e. the greater the number of iterations the better would be the result but if the numbers of iterations cross the optimum limit it would lead to a higher error percentage. Here in Matlab neural network tool there are maximum of 1000 iterations and the results can be displayed in various forms like in the form of a bar graph and there are also other graphs and state diagrams to show the status of our network.
    • International Journal of Advanced Research in Engineering and Technology (IJARET), ISSN 0976 – 6480(Print), ISSN 0976 – 6499(Online) Volume 5, Issue 3, March (2014), pp. 208-215, © IAEME 214 Basically we divide our given data into three parts, 1) Training data 2) Validation data 3) Testing data The training data is used for the training or learning purpose and validation data validates the network or specifies the range of valid data which our network can properly classify and finally the test data is used for testing the networks. The percentage of the training data is the highest because it’s the data which helps in building and balancing the weights and the validation and test data can be varied according to our conveniences and according to the ambiguity or complexity in our data. Basically if we have a large data set or large data points we can decrease the percentage of our training data and increase the percentage of our validation and test data. Thus we say that the process of optimization mainly depends on the factors like ambiguity or complexity in our dataset, the size of our data set , and it also depends on identifying the right number of neurons required for the classifiers and the number of trains required to reach optimum level because in many cases the classifier network goes through many states and it is necessary to choose the one which meets needs or which is close to requirements. IV. CONCLUSION In this article I have concentrated the area of study on artificial neural network and its working principles. A study is focused on the supervised learning technique and illustrated it by fitting tool and pattern recognition tool from the Matlab neural network tool box. Here it is also explained the advantages of artificial neural net work over basic classification process i.e. it mainly reduces the human labour applied in determining and fixing the various rules of classification. It is also described that how to obtain optimum results and various factors involved in it like the number of neurons, number of trains and ambiguity of our data i.e. by adjusting the number of neurons according to the ambiguity of our data and to obtain efficient results required. V. ACKNOWLEDGEMENTS We are very much thankful to David J. Cavuto for the thesis “An Exploration and Development of Current Artificial Neural Network Theory and Applications with Emphasis on Artificial Life” submitted to The Cooper Union Albert Nerken School of Engineering, which became more use full in writing this article. REFERENCES Theses [1] David J. Cavuto “An Exploration and Development of Current Artificial Neural Network Theory and Applications with Emphasis on Artificial Life”. Submitted to The Cooper Union Albert Nerken School of Engineering, May 1997. Book [2] Daniel Graupe “Principals of Artificial Neural Networks”, Published by “World Scientific Publishing PVT ltd”, 2007 Chapters 2, 3, 4, 6. [3] D. C. Ciresan, U. Meier, J. Masci, J. Schmidhuber. Multi-Column Deep Neural Network for Traffic Sign Classification. Neural Networks, 2012.
    • International Journal of Advanced Research in Engineering and Technology (IJARET), ISSN 0976 – 6480(Print), ISSN 0976 – 6499(Online) Volume 5, Issue 3, March (2014), pp. 208-215, © IAEME 215 [4] Ferreira, C. (2006). "Designing Neural Networks Using Gene Expression Programming". In A. Abraham, B. de Baets, M. Köppen, and B. Nickolay, eds., Applied Soft Computing Technologies: The Challenge of Complexity, pages 517–536, Springer-Verlag. [5] Dominic, S., Das, R., Whitley, D., Anderson, C. (July 1991). "Genetic reinforcement learning for neural networks". IJCNN-91-Seattle International Joint Conference on Neural Networks. IJCNN-91-Seattle International Joint Conference on Neural Networks. Seattle, Washington, USA: IEEE. doi:10.1109/IJCNN.1991.155315. ISBN 0-7803-0164-1. Retrieved 29 July 2012. [6] Russell, Ingrid. "Neural Networks Module". Retrieved 2012. [7] Description of the Neural Networks tools from math works website.