2. The Artificial Neuron
• An artificial neuron (AN), or neuron, implements a nonlinear mapping from RI usually to [0,
1] or [−1, 1], depending on the activation function used. That is,
• where I is the number of input signals to the AN. An AN receives a vector
of I input signals,
• either from the environment or from other ANs. To each input signal, zi, is
associated a weight, vi, to strengthen or deplete the input signal. The AN
computes the net input signal, and uses an activation function fAN to
compute the output signal, o, given the net input. The strength of the output
signal is further influenced by a threshold value, θ, also referred to as the
bias.
www.credosystemz.com
3. Calculating the Net Input Signal
• The net input signal to an AN is usually computed as the weighted
sum of all input signals,
• Artificial neurons that compute the net input signal as the weighted
sum of input signals are referred to as summation units (SU).
• An alternative to compute the net input signal is to use product units
(PU), where
• Product units allow higher-order combinations of inputs, having the
advantage of increased information capacity.
www.credosystemz.com
4. Activation Functions
The function fAN receives the net input
signal and bias, and determines the output
(or firing strength) of the neuron. This
function is referred to as the activation
function.
Frequently used activation functions are:
1. Linear function
2. Step function
3. Ramp function
6. Artificial Neuron Geometry
• The hyperplane forms the boundary between
the input vectors associated with the two
output values.
• Figure 2.3 illustrates the decision boundary for
a neuron with the ramp activation function.
The hyperplane separates the input vectors for
which from the input vectors for
which
www.credosystemz.com
7. • Figure 2.4 shows how two Boolean functions, AND and OR, can be implemented
using a single perceptron. These are examples of linearly separable functions.
• Figure 2.5 is an example of a Boolean function that is not linearly separable - XOR
8. Artificial Neuron Learning
• There are three main types of learning:
• Supervised learning, where the neuron (or NN) is provided with a
data set consisting of input vectors and a target (desired output)
associate d with each input vector. This data set is referred to as the
training set.
The aim of supervised training is then to adjust the weight values
such that the error between the real output, o = f(net−θ), of the
neuron and the target output, t, is minimized.
• Unsupervised learning, where the aim is to discover patterns or
features in the input data with no assistance from an external source.
Many unsupervised learning algorithms basically perform a
clustering of the training patterns.
• Reinforcement learning, where the aim is to reward the neuron (or
parts of a NN) for good performance, and to penalize the neuron for
bad performance.
www.credosystemz.com
9. Augmented Vectors
• An artificial neuron is characterized by its weight vector v, threshold θ
and activation function.
• During learning, both the weights and the threshold are adapted.
• To simplify learning equations, the input vector is augmented to include an
additional input unit, zI+1, referred to as the bias unit.
• The value of zI+1 is always -1, and the weight vI+1 serves as the value of
the threshold. The net input signal to the AN (assuming SUs) is then
calculated as
www.credosystemz.com
10. Gradient Descent Learning Rule
• GD requires the definition of an error (or objective) function to measure the
neuron’s error in approximating the target.
• The sum of squared errors is usually used, where tp and op are respectively
the target and actual output for the p-th pattern, and PT is the total number
of input-target vector pairs (patterns) in the training set.
• The aim of GD is to find the weight values that minimize E. This is
achieved by calculating the gradient of E in weight space, and to move the
weight vector along the negative gradient (as illustrated for a single
weight in Figure 2.6).
11. • Given a single training pattern, weights are updated
using
• and η is the learning rate (i.e. the size of the steps taken
in the negative direction of the gradient).
• The calculation of the partial derivative of f with
respect to netp (the net input for pattern p) presents a
problem for all discontinuous activation functions, such
as the step and ramp functions; zi,p is the i-th input
signal corresponding to pattern p.
www.credosystemz.com
12. Widrow-Hoff Learning Rule
• For the Widrow-Hoff learning rule, assume that f = netp. Then
∂f ∂netp = 1, giving
• Weights are then updated using
• The Widrow-Hoff learning rule, also referred to as the least-
means-square (LMS) algorithm, was one of the first
algorithms used to train layered neural networks with multiple
adaptive linear neurons.
www.credosystemz.com
13. Generalized Delta Learning Rule
• The generalized delta learning rule is a
generalization of the Widrow-Hoff learning
rule that assumes differentiable activation
functions. Assume that the sigmoid function
(from equation (2.11)) is used. Then,
www.credosystemz.com
14. Error-Correction Learning Rule
• For the error-correction learning rule it is
assumed that binary-valued activation
functions are used, for example, the step
function.
• Weights are only adjusted when the neuron
responds in error.
Checkout: http://bit.ly/2Mub6xP