More Related Content
Similar to 20120140503023
Similar to 20120140503023 (20)
More from IAEME Publication
More from IAEME Publication (20)
20120140503023
- 1. International Journal of Advanced Research in Engineering and Technology (IJARET), ISSN 0976 –
6480(Print), ISSN 0976 – 6499(Online) Volume 5, Issue 3, March (2014), pp. 208-215, © IAEME
208
SUPERVISED LEARNING IN ARTIFICIAL NEURAL NETWORKS
Dr. Rajeshwari S. Mathad
Department of Electronics Basaveshwar Science College, BAGALKOT, India
ABSTRACT
This research review article is about Artificial Neural network and about its working
principle and its optimization. Artificial neural network is basically derived from the biological
neural network, the key factor in the whole ANN is its weight upgrading capacity which also
employees back propagation technique to improve its efficiency. I have described two neural
network tools of Matlab that are fitting tool and pattern recognition tool from the neural network tool
box both of which come under supervised learning technique. I have also described about the
optimization of the neural network and also different factors involved in its optimization.
Keywords: ANN-Artificial Neural Network.
I. INTRODUCTION
The nerve cell, or neuron, has four general regions, each defined by its physical position in
the cell as well as its function .The cell body has two types of interconnection structures which
emerge from it: dendrites and the axon. Each neuron generally has only one axon, but typically has
many dendrites. The axon carries the nerve signal away from the cell body to other neurons.
Dendrites carry signals in towards the cell body from the axons of other neurons. As such, the basic
nerve cell can be thought of as a “black box” which has a number of inputs (dendrites) and only one
output (the axon). A substance called a neurotransmitter flows across the gap from one neuron to
another, thereby acting as a chemical “bridge” for the neural signal. This gap is consequently known
as chemical synapse. This neurotransmitter material consists of sodium and potassium ions which
intern help in creating the depolarization across the cell membrane near synapse. The depolarized
region occurs over a small portion of the cell membrane, usually along the axon and this drastic
change in polarization, known as the action potential. The action potential propagates down the axon
like an electrical pulse radiating out from its source. And thus the cell body acts as a summing
amplifier which multiplies an attenuating factor and then sends out the final output.
Similar to above discussion the artificial neural networks consists of nodes which perform the
same function as that of a neuron in a nerve cell and its output and efficiency can be varied by
INTERNATIONAL JOURNAL OF ADVANCED RESEARCH IN ENGINEERING
AND TECHNOLOGY (IJARET)
ISSN 0976 - 6480 (Print)
ISSN 0976 - 6499 (Online)
Volume 5, Issue 3, March (2014), pp. 208-215
© IAEME: www.iaeme.com/ijaret.asp
Journal Impact Factor (2014): 7.8273 (Calculated by GISI)
www.jifactor.com
IJARET
© I A E M E
- 2. International Journal of Advanced Research in Engineering and Technology (IJARET), ISSN 0976 –
6480(Print), ISSN 0976 – 6499(Online) Volume 5, Issue 3, March (2014), pp. 208-215, © IAEME
209
updating its weights with respect to its inputs and outputs .The weights on the nodes play a key role
because every single node is identical and therefore they can be differentiated only by their weights.
Usually the weights and biases are initialized randomly, so as not to impose a prejudice on the
network. Therefore, we must have some facility for changing the weights and biases in a systematic
fashion. This is referred to as a learning rule, because the process of updating the weights and biases
is thought of as training the network. [1-6]
i) BASIC PRINCIPLE OF WEIGHTS
Figure 1: Each input X is attenuated or amplified by a constant weight W
Each input X is attenuated or amplified by a constant weight W, which relates to the physical
attenuation on a synapse. Now this in case of a single node. But in case of multiple nodes a bias
which controls the final output. i.e the model uses some sort of nonlinearity to capture the
thresholding of the bias. The hard limit function, referred as the Heaviside function is used. This
function is +1 when its input is greater than zero and 0 when its input is less than or equal to zero.
But it is required to encompass a definite threshold value, than simply above or below zero. But a
bias b can add to the function inside the hard-limit to create its discontinuity along the real axis. By
summarising all together, the formula obtained for the output of this model neuron is:
O = fnl (Σxiwi + b)
Where fnl is a non-linear transfer function, e.g. the hard-limit function. The neurons are
generally arranged in parallel, to form layers. Each neuron in a layer is having a same number of
inputs, but it is equal to the total number of inputs to that layer. Therefore, every layer of p neurons
have a total of n inputs and a total of p outputs, but each neuron has exactly one output. This conveys
that each neuron has n inputs. But all the neurons in a layer have the same nonlinear transfer
function.
Figure-2: Block diagram of neural model
- 3. International Journal of Advanced Research in Engineering and Technology (IJARET), ISSN 0976 –
6480(Print), ISSN 0976 – 6499(Online) Volume 5, Issue 3, March (2014), pp. 208-215, © IAEME
210
Every neuron arranged in a layer in this manner are referred as a perceptron. Inputs to the
layer are applied in parallel to all neural inputs simultaneously. For each neuron k in [1, p], each
input xi is multiplied by the weight wi, for i in [1, n], and then summed. Each sum is then added to
its bias bk and allowed to passed through the nonlinear function. At the conclusion of the process,
the layer outputs appear in parallel. This process can be explained mathematically as if the inputs are
presented as a row vector, the layer outputs can be found by the matrix product X W + B, which
results in a (p x1) column vector. If the nonlinear transfer function is applied to each of the elements,
we get the outputs to the neural layer. Perceptrons are trained in a supervised learning fashion. Also
perceptrons are generally arranged in a strictly feed forward fashion. There are usually no closed
loops, or cycles, either. The learning rule is then applied to the layer. A simple learning rule which is
widely used is called as Widrow-Hoff rule:
∆ =d(t) y(t)
wi(t 1) = wi (t) h∆xi (t)
d(t) = +1, if input from class A
d(t) = 0, if input from class B
where 0 ≤h ≤ 1, a positive gain function. Class A is defined to be when the output is 0 and
should be 1; Class B is defined to be when the output is 1 and should be 0.
The above said rule specifies a simple method of updating each weight. This tries to
minimize the error between the target output and the experimentally obtained output for each neuron
by calculating that error and termed as “delta.” Now each weight is adjusted by adding to it its delta,
multiplied by attenuation constant, usually about 10%. This process is iterated until the total error
falls below some threshold. By adding its specific error to each of the weights, it is ensured that the
network is being moved towards the position of minimum error. By using an attenuation constant,
than the full value of the error, we move it slowly towards this position of minimum error.
Each neural layer it means, each perceptron, is able to divide a space linearly. Picturing this
process in two dimensions will be drawing a straight line across the Cartesian grid, and in three
dimensions, like slicing a cube in two pieces along any arbitrary plane. Higher dimensions are
partitioned in similar way, but are impossible to visualize. However, if we cascade a number of such
processes, each succeeding layer can perform another linear partitioning, but it may be along a
different (hyper) plane. Drawing one set of lines on the grid gives a binary partitioning: A-or-not-A.
But if we take this already partitioned space and further partition it, we can obtain further refinement
of the specification region. Then if we take that region and partition it once again, we get an even
more refined region. In fact, we can show, just as informally, that the only three layers are necessary
to define any region in n-space. The first layer allows the drawing of “lines”. The second layer
allows the combining of these lines into “convex hulls”. The third layer allows the convex hulls to be
combined into arbitrary regions. So one can construct a neural network of three layers of cascaded
perceptrons. This is called as the Multi-layer Perceptron or MLP.[1]
- 4. International Journal of Advanced Research in Engineering and Technology (IJARET), ISSN 0976 –
6480(Print), ISSN 0976 – 6499(Online) Volume 5, Issue 3, March (2014), pp. 208-215, © IAEME
211
Figure-3: Three layered the Multi-layer Perceptron
ii) BACK PROPAGATION PRINCIPLE
The second principle used is the back propagation principle. The back propagation learning
algorithm is divided into two phases: propagation and weight update.
Phase 1: Propagation
Each propagation involves the following steps:
1. Forward propagation of a training pattern's input through the neural network in order to
generate the propagation's output activations.
2. Backward propagation of the propagation's output activations through the neural network
using the training pattern target in order to generate the deltas of all output and hidden
neurons.
Phase 2: Weight update
Each weight-synapse follow the following steps:
1. Multiply its output delta and input activation to get the gradient of the weight.
2. Subtract a ratio that is percentage of the gradient from the weight.
This ratio or the percentage influences the speed and quality of learning; it is called the learning
rate. The greater the ratio, the faster the neuron trains; the lower the ratio, the more accurate the
training is. The sign of the gradient of a weight indicates where the error is increasing, this is why the
weight must be updated in the opposite direction.
Repeat phase 1 and 2 until the performance of the network is satisfactory.
II. METHODOLOGY
Basically the artificial neural net works are used as classifiers for the purpose of artificial
intelligence.
- 5. International Journal of Advanced Research in Engineering and Technology (IJARET), ISSN 0976
6480(Print), ISSN 0976 – 6499(Online) Volume 5, Issue
Generally one can implement ANN in M
1) Curve Fitting tool
2) Pattern Recognition
3) Clustering
4) Time series
There are three types of learning procedures, those are
1) Supervised:
In supervised learning, we are given a set of example pairs
find a function in the allowed class of functions that matches the e
words, it is to infer the mapping implied by the data;
mismatch between our mapping and the data and it implicitly contains prior knowledge about
the problem domain.
2) Unsupervised:
In unsupervised learning, some data
be any function of the data
3) Reinforcement :
In reinforcement learning, data
the environment. At each point in
generates an observation
unknown dynamics. The aim is to discover a
measure of a long-term cost, it means
dynamics and long-term cost for each policy are usually unknown, but can be estimate
In this article we are considering only the analysis of
the first two among the Matlab tool box
In case of the supervised learning method
respective output to the output. This relies on the fact that a given neuron needs to be aware of the
state of all inputs to the net, as well as all the outputs of
should be reinforced or suppressed. However, in the three
both pieces of information, first layer has the input information, but no output information
opposite is true for the output layer. And the middle lay
roots in the fact that the hard-limit transfer function provides no information to the output of a
neuron regarding its inputs. As such, it is useless for our purpose
can make use of other nonlinear functions, as mentioned previously, which have transfer functions
similar to the hard-limit function, but not its problem
functions, we are able to obtain information about the state of the inputs in
is possible to be able to correct the weight
about both the outputs and the inputs. Once we update the final layer, we can back
error at third layer to the previous layer. Then it is
and compute its error. That error is then propagated back to the layer before it. This process can
continue for as many layers as necessary. The learning rule described here is known as the
“generalized delta rule” because the error is a “delta” which is propagated back to previous layers.
International Journal of Advanced Research in Engineering and Technology (IJARET), ISSN 0976
6499(Online) Volume 5, Issue 3, March (2014), pp. 208-215, © IAEME
212
ANN in Matlab in four different formats those are,
e types of learning procedures, those are,
, we are given a set of example pairs
in the allowed class of functions that matches the examples. In other
the mapping implied by the data; and the cost function is related to the
mismatch between our mapping and the data and it implicitly contains prior knowledge about
some data is taken and the cost function to be minimized, that can
and the network's output, .
, data are not given, but generated by an agent's interactions with
the environment. At each point in time , the agent performs an action
and an instantaneous cost , according to some,
dynamics. The aim is to discover a policy for selecting actions that minimizes some
cost, it means the expected cumulative cost. Th
term cost for each policy are usually unknown, but can be estimate
In this article we are considering only the analysis of Supervised learning techniques th
tool box tools i.e. Fitting tool and pattern recognition tool.
he supervised learning method first supply the input data to the input and the
. This relies on the fact that a given neuron needs to be aware of the
state of all inputs to the net, as well as all the outputs of the net, in order to determine if
should be reinforced or suppressed. However, in the three-layer arrangement described,
first layer has the input information, but no output information
rue for the output layer. And the middle layer has no information. This dilemma finds its
limit transfer function provides no information to the output of a
neuron regarding its inputs. As such, it is useless for our purposes. To get around this problem, one
make use of other nonlinear functions, as mentioned previously, which have transfer functions
limit function, but not its problem-causing discontinuity.
tain information about the state of the inputs in layer layers. Therefore, it
able to correct the weights in the final layer, because now one can
about both the outputs and the inputs. Once we update the final layer, we can back
the previous layer. Then it is able to update the weights at the previous layer,
is then propagated back to the layer before it. This process can
continue for as many layers as necessary. The learning rule described here is known as the
because the error is a “delta” which is propagated back to previous layers.
International Journal of Advanced Research in Engineering and Technology (IJARET), ISSN 0976 –
, © IAEME
and the aim is to
xamples. In other
the cost function is related to the
mismatch between our mapping and the data and it implicitly contains prior knowledge about
and the cost function to be minimized, that can
not given, but generated by an agent's interactions with
and environment
, according to some, usually
for selecting actions that minimizes some
the expected cumulative cost. The environment's
term cost for each policy are usually unknown, but can be estimated.
Supervised learning techniques that are
tools i.e. Fitting tool and pattern recognition tool.
first supply the input data to the input and the
. This relies on the fact that a given neuron needs to be aware of the
net, in order to determine if weight
layer arrangement described, no layer has
first layer has the input information, but no output information — the
. This dilemma finds its
limit transfer function provides no information to the output of a
. To get around this problem, one
make use of other nonlinear functions, as mentioned previously, which have transfer functions
By using other
layer layers. Therefore, it
have information
about both the outputs and the inputs. Once we update the final layer, we can back propagate the
able to update the weights at the previous layer,
is then propagated back to the layer before it. This process can
continue for as many layers as necessary. The learning rule described here is known as the
because the error is a “delta” which is propagated back to previous layers.
- 6. International Journal of Advanced Research in Engineering and Technology (IJARET), ISSN 0976 –
6480(Print), ISSN 0976 – 6499(Online) Volume 5, Issue 3, March (2014), pp. 208-215, © IAEME
213
The algorithm is as follows. Starting from the output and working backwards, calculate the
following:
wij (t 1) =wij (t) hd pjopj
where wij (t) represents the weights from node i to node j at time t, h is a gain term, and dpj is
an error term for pattern p on node j.
For output units
dpj =kopj (1opj )(tpj opj )
For hidden units
dpj =kopj (1opj ) k ∑ d pkwjk
where the sum is over the k nodes in the layer above node j.
In case of Matlab tools we consider the fitting tool and pattern recognition tool.
1) Fitting tool: As the name suggests the fitting tool basically helps us in deriving a linear
equation between our inputs and outputs and this is done b the process of regression and it
also helps us in determining the error percentage and the size of error through its various
graphs like error histogram plot , training plot , region of convergence plot (ROC) and also
standard deviation plot .The main backdrop of this tool is that for a data with higher
ambiguity it doesn’t provide fruitful results because here we are restricted to only the first
degree curve
.
2) Pattern recognition tool: This tool helps in recognising the pattern of our data and it trains the
network according to it the main advantage of this tool is that, it can occupy a higher order of
ambiguity in the data and its outputs can be extracted in both forms it in the form of a fitting
tool and in the form of total no of hits and misses it incorporates all the plots which the fitting
tool provides plus with plot of confusion matrix which helps us in knowing the efficiency of
the network with respect to every single class that we ought to classify the data.[7]
III. OPTIMIZATION
Before knowing about how to optimize our result we must know the need for optimisation.
The optimization is required ,for a specific kind of data, where in our maximum possible efficiency
would be 75% and if we further proceed to train our network again and again it goes into an unstable
state and its efficiency starts decreasing .Thus we must keep in mind that for our network to perform
better we need to have a specific and optimized number of iterations , specific number of neurons
and specific number of trains because it mainly depends on the ambiguity in the data which we are
giving to our network i.e. if we have high error percentage we prefer to increase number of neurons
to get an optimum result and even after that we have the same error percent we then train our
network more number of time lines in order to get its weights properly balanced . Same things apply
for number of iterations i.e. the greater the number of iterations the better would be the result but if
the numbers of iterations cross the optimum limit it would lead to a higher error percentage.
Here in Matlab neural network tool there are maximum of 1000 iterations and the results
can be displayed in various forms like in the form of a bar graph and there are also other graphs and
state diagrams to show the status of our network.
- 7. International Journal of Advanced Research in Engineering and Technology (IJARET), ISSN 0976 –
6480(Print), ISSN 0976 – 6499(Online) Volume 5, Issue 3, March (2014), pp. 208-215, © IAEME
214
Basically we divide our given data into three parts,
1) Training data 2) Validation data 3) Testing data
The training data is used for the training or learning purpose and validation data validates the
network or specifies the range of valid data which our network can properly classify and finally the
test data is used for testing the networks. The percentage of the training data is the highest because
it’s the data which helps in building and balancing the weights and the validation and test data can be
varied according to our conveniences and according to the ambiguity or complexity in our data.
Basically if we have a large data set or large data points we can decrease the percentage of our
training data and increase the percentage of our validation and test data.
Thus we say that the process of optimization mainly depends on the factors like ambiguity
or complexity in our dataset, the size of our data set , and it also depends on identifying the right
number of neurons required for the classifiers and the number of trains required to reach optimum
level because in many cases the classifier network goes through many states and it is necessary to
choose the one which meets needs or which is close to requirements.
IV. CONCLUSION
In this article I have concentrated the area of study on artificial neural network and its
working principles. A study is focused on the supervised learning technique and illustrated it by
fitting tool and pattern recognition tool from the Matlab neural network tool box. Here it is also
explained the advantages of artificial neural net work over basic classification process i.e. it mainly
reduces the human labour applied in determining and fixing the various rules of classification. It is
also described that how to obtain optimum results and various factors involved in it like the number
of neurons, number of trains and ambiguity of our data i.e. by adjusting the number of neurons
according to the ambiguity of our data and to obtain efficient results required.
V. ACKNOWLEDGEMENTS
We are very much thankful to David J. Cavuto for the thesis “An Exploration and
Development of Current Artificial Neural Network Theory and Applications with Emphasis on
Artificial Life” submitted to The Cooper Union Albert Nerken School of Engineering, which became
more use full in writing this article.
REFERENCES
Theses
[1] David J. Cavuto “An Exploration and Development of Current Artificial Neural Network
Theory and Applications with Emphasis on Artificial Life”. Submitted to The Cooper Union
Albert Nerken School of Engineering, May 1997.
Book
[2] Daniel Graupe “Principals of Artificial Neural Networks”, Published by “World Scientific
Publishing PVT ltd”, 2007 Chapters 2, 3, 4, 6.
[3] D. C. Ciresan, U. Meier, J. Masci, J. Schmidhuber. Multi-Column Deep Neural Network for
Traffic Sign Classification. Neural Networks, 2012.
- 8. International Journal of Advanced Research in Engineering and Technology (IJARET), ISSN 0976 –
6480(Print), ISSN 0976 – 6499(Online) Volume 5, Issue 3, March (2014), pp. 208-215, © IAEME
215
[4] Ferreira, C. (2006). "Designing Neural Networks Using Gene Expression Programming". In
A. Abraham, B. de Baets, M. Köppen, and B. Nickolay, eds., Applied Soft Computing
Technologies: The Challenge of Complexity, pages 517–536, Springer-Verlag.
[5] Dominic, S., Das, R., Whitley, D., Anderson, C. (July 1991). "Genetic reinforcement learning
for neural networks". IJCNN-91-Seattle International Joint Conference on Neural Networks.
IJCNN-91-Seattle International Joint Conference on Neural Networks. Seattle, Washington,
USA: IEEE. doi:10.1109/IJCNN.1991.155315. ISBN 0-7803-0164-1. Retrieved 29 July 2012.
[6] Russell, Ingrid. "Neural Networks Module". Retrieved 2012.
[7] Description of the Neural Networks tools from math works website.