2. What is Artificial Neural Network?
• The term "Artificial Neural Network" is derived from Biological neural
networks that develop the structure of a human brain. Similar to the
human brain that has neurons interconnected to one another,
artificial neural networks also have neurons that are interconnected
to one another in various layers of the networks. These neurons are
known as nodes.
https://www.javatpoint.com/artificial-neural-network
3. Neural Network
• NN are constructed and implemented to model the human brain.
• Performs various tasks such as pattern-matching, classification,
optimization function, approximation, vector quantization and data
clustering.
• These tasks are difficult for traditional computers
4. ANN
• ANN posess a large number of processing elements called
nodes/neurons which operate in parallel.
• Neurons are connected with others by connection link.
• Each link is associated with weights which contain information about
the input signal.
• Each neuron has an internal state of its own which is a function of the
inputs that neuron receives- Activation level
• In short , An artificial neural network consists of a pool of simple
processing units which communicate by sending signals to each other
over a large number of weighted connections.
5. Artificial Neural Network
A set of major aspects of a parallel distributed model include:
a set of processing units (cells).
a state of activation for every unit, which equivalent to the output of the
unit.
connections between the units. Generally each connection is defined by a
weight.
a propagation rule, which determines the effective input of a unit from its
external inputs.
an activation function, which determines the new level of activation based
on the effective input and the current activation.
an external input for each unit.
a method for information gathering (the learning rule).
an environment within which the system must operate, providing input
signals and _ if necessary _ error signals.
6. Artificial Neural Networks
• The “building blocks” of neural networks are the
neurons.
• In technical systems, we also refer to them as units or nodes.
• Basically, each neuron
receives input from many other neurons.
changes its internal state (activation) based on the current
input.
sends one output signal to many other neurons, possibly
including its input neurons (recurrent network).
7. Artificial Neural Networks
• Information is transmitted as a series of electric
impulses, so-called spikes.
• The frequency and phase of these spikes encodes the
information.
• In biological systems, one neuron can be connected to as
many as 10,000 other neurons.
• Usually, a neuron receives its information from other
neurons in a confined area, its so-called receptive field.
8. How do ANNs work?
An artificial neural network (ANN) is either a hardware
implementation or a computer program which strives to
simulate the information processing capabilities of its biological
exemplar. ANNs are typically composed of a great number of
interconnected artificial neurons. The artificial neurons are
simplified models of their biological counterparts.
ANN is a technique for solving problems by constructing software
that works like our brains.
9. How do our brains work?
The Brain is A massively parallel information processing system.
Our brains are a huge network of processing elements. A typical
brain contains a network of 10 billion neurons.
10. How do our brains work?
A processing element
Dendrites: Input
Cell body: Processor
Synaptic: Link
Axon: Output
11. From Biological to Artificial Neurons
The Neuron - A Biological Information Processor
• dentrites - the receivers
• soma - neuron cell body (sums input signals)
• axon - the transmitter
• synapse - point of transmission
• neuron activates after a certain threshold is met
Learning occurs via electro-chemical changes in
effectiveness of synaptic junction.
12. From Biological to Artificial Neurons
An Artificial Neuron - The Perceptron
• simulated on hardware or by software
• input connections - the receivers
• node, unit, or PE simulates neuron body
• output connection - the transmitter
• activation function employs a threshold or bias
• connection weights act as synaptic junctions
Learning occurs via changes in value of the connection
weights.
13. From Biological to Artificial Neurons
An Artificial Neuron - The Perceptron
• simulated on hardware or by software
• input connections - the receivers
• node, unit, or PE simulates neuron body
• output connection - the transmitter
• activation function employs a threshold or bias
• connection weights act as synaptic junctions
Learning occurs via changes in value of the connection
weights.
14. How do our brains work?
A processing element
A neuron is connected to other neurons through about 10,000
synapses
15. How do our brains work?
A processing element
A neuron receives input from other neurons. Inputs are combined.
16. How do our brains work?
A processing element
Once input exceeds a critical level, the neuron discharges a spike ‐
an electrical pulse that travels from the body, down the axon, to
the next neuron(s)
17. How do our brains work?
A processing element
The axon endings almost touch the dendrites or cell body of the
next neuron.
18. How do our brains work?
A processing element
Transmission of an electrical signal from one neuron to the next is
effected by neurotransmitters.
19. How do our brains work?
A processing element
Neurotransmitters are chemicals which are released from
the first neuron and which bind to the
Second.
20. How do our brains work?
A processing element
This link is called a synapse. The strength of the signal that
reaches the next neuron depends on factors such as the amount of
neurotransmitter available.
21. How do ANNs work?
An artificial neuron is an imitation of a human neuron
22. How do ANNs work?
• Now, let us have a look at the model of an artificial neuron.
23. How do ANNs work?
Output
x1
x2
xm
∑
y
Processing
Input
∑= X1+X2 + ….+Xm =y
. . . . . . . . . . .
.
24. How do ANNs work?
Not all inputs are equal
Output
x1
x2
xm
∑
y
Processing
Input
∑= X1w1+X2w2 + ….+Xmwm
=y
w1
w2
w
m
weights
. . . . . . . . . . .
.
. . . .
.
25. How do ANNs work?
The signal is not passed down to the
next neuron verbatim
Transfer Function
(Activation Function)
Output
x1
x2
xm
∑
y
Processing
Input
w1
w2
wm
weights
. . . . . . . . . . .
.
f(vk)
. . . .
.
26. The output is a function of the input, that is
affected by the weights, and the transfer
functions
28. 28
A Simple Model of a Neuron
(Perceptron)
• Each neuron has a threshold value
• Each neuron has weighted inputs from other
neurons
• The input signals form a weighted sum
• If the activation level exceeds the threshold, the
neuron “fires”
w1j
w2j
w3j
wij
y1
y2
y3
yi
O
29. 29
An Artificial Neuron
• Each hidden or output neuron has weighted input
connections from each of the units in the preceding layer.
• The unit performs a weighted sum of its inputs, and
subtracts its threshold value, to give its activation level.
• Activation level is passed through a sigmoid activation
function to determine output.
w1j
w2j
w3j
wij
y1
y2
y3
yi
f(x) O
31. 31
Perceptron Training
• Linear threshold is used.
• W - weight value
• t - threshold value
1 if wi xi >t
Output=
0 otherwise
{ i=0
32. 32
Simple network
1 if wixi >t
output=
0 otherwise
{ i=0
t = 0.0
Y
X
W1 = 1.5
W3 = 1
-1
AND with a Biased input
W2 = 1
33. 33
Learning algorithm
While epoch produces an error
Present network with next inputs from epoch
Error = T – O
If Error <> 0 then
Wj = Wj + LR * Ij * Error
End If
End While
34. 34
Learning algorithm
Epoch : Presentation of the entire training set to the neural
network.
In the case of the AND function an epoch consists
of four sets of inputs being presented to the
network (i.e. [0,0], [0,1], [1,0], [1,1])
Error: The error value is the amount by which the value
output by the network differs from the target
value. For example, if we required the network to
output 0 and it output a 1, then Error = -1
35. 35
Learning algorithm
Target Value, T : When we are training a network we not
only present it with the input but also with a value
that we require the network to produce. For
example, if we present the network with [1,1] for
the AND function the target value will be 1
Output , O : The output value from the neuron
Ij : Inputs being presented to the neuron
Wj : Weight from input neuron (Ij) to the output neuron
LR : The learning rate. This dictates how quickly the
network converges. It is set by a matter of
experimentation. It is typically 0.1
36. 36
Training Perceptrons
t = 0.0
y
x
-1
W1 = ?
W3 = ?
W2 = ?
For AND
A B Output
0 0 0
0 1 0
1 0 0
1 1 1
•What are the weight values?
•Initialize with random weight values
38. 38
Learning in Neural Networks
• Learn values of weights from I/O pairs
• Start with random weights
• Load training example’s input
• Observe computed input
• Modify weights to reduce difference
• Iterate over all training examples
• Terminate when weights stop changing OR when error is very small
39. Artificial Neural Networks
An ANN can:
1. compute any computable function, by the appropriate
selection of the network topology and weights values.
2. learn from experience!
Specifically, by trial‐and‐error
40. Learning by trial‐and‐error
Continuous process of:
Trial:
Processing an input to produce an output (In terms of
ANN: Compute the output function of a given input)
Evaluate:
Evaluating this output by comparing the actual
output with the expected output.
Adjust:
Adjust the weights.
41. How it works?
Set initial values of the weights randomly.
Input: truth table of the XOR
Do
Read input (e.g. 0, and 0)
Compute an output (e.g. 0.60543)
Compare it to the expected output. (Diff= 0.60543)
Modify the weights accordingly.
Loop until a condition is met
Condition: certain number of iterations
Condition: error threshold
42. Design Issues
Initial weights (small random values ∈[‐1,1])
Transfer function (How the inputs and the weights are
combined to produce output?)
Error estimation
Weights adjusting
Number of neurons
Data representation
Size of training set
43. Transfer Functions
Linear: The output is proportional to the total
weighted input.
Threshold: The output is set at one of two values,
depending on whether the total weighted input is
greater than or less than some threshold value.
Non‐linear: The output varies continuously but not
linearly as the input changes.
44. Error Estimation
The root mean square error (RMSE) is a frequently-
used measure of the differences between values
predicted by a model or an estimator and the values
actually observed from the thing being modeled or
estimated
45. Weights Adjusting
After each iteration, weights should be adjusted to
minimize the error.
– All possible weights
– Back propagation
50. Feedforward Network
• Its output and input vectors are respectively
• Weight wij connects the i’th neuron with
j’th input. Activation rule of ith neuron is
where
EXAMPLE
52. Feedback network
When outputs are directed back as
inputs to same or preceding layer
nodes it results in the formation of
feedback networks
53. Lateral feedback
If the feedback of the output of the processing elements is directed back
as input to the processing elements in the same layer then it is called
lateral feedback
54. Recurrent n/ws
• Single node with own feedback
• Competitive nets
• Single-layer recurrent nts
• Multilayer recurrent networks
Feedback networks with closed loop are called Recurrent Networks. The
response at the k+1’th instant depends on the entire history of the network
starting at k=0.
Automaton: A system with discrete time inputs and a discrete data
representation is called an automaton
55. Basic models of ANN
Basic Models of
ANN
Interconnections Learning rules
Activation
function
56. Learning
• It’s a process by which a NN adapts itself to a stimulus by making
proper parameter adjustments, resulting in the production of desired
response
• Two kinds of learning
• Parameter learning:- connection weights are updated
• Structure Learning:- change in network structure
57. Training
• The process of modifying the weights in the connections between
network layers with the objective of achieving the expected output is
called training a network.
• This is achieved through
• Supervised learning
• Unsupervised learning
• Reinforcement learning
59. Supervised Learning
• Child learns from a teacher
• Each input vector requires a corresponding target vector.
• Training pair=[input vector, target vector]
Neural
Network
W
Error
Signal
Generator
X
(Input)
Y
(Actual output)
(Desired Output)
Error
(D-Y)
signals
61. Unsupervised Learning
• How a fish or tadpole learns
• All similar input patterns are grouped together as
clusters.
• If a matching input pattern is not found a new cluster
is formed
63. Self-organizing
• In unsupervised learning there is no feedback
• Network must discover patterns, regularities, features for the input
data over the output
• While doing so the network might change in parameters
• This process is called self-organizing
65. When Reinforcement learning is used?
• If less information is available about the target output values (critic
information)
• Learning based on this critic information is called reinforcement
learning and the feedback sent is called reinforcement signal
• Feedback in this case is only evaluative and not instructive
66. Basic models of ANN
Basic Models of
ANN
Interconnections Learning rules
Activation
function
67. 1. Identity Function
f(x)=x for all x
2. Binary Step function
3. Bipolar Step function
4. Sigmoidal Functions:- Continuous functions
5. Ramp functions:-
Activation Function
ifx
ifx
x
f
0
1
{
)
(
ifx
ifx
x
f
1
1
{
)
(
0
0
1
0
1
1
)
(
ifx
x
if
x
ifx
x
f
68. 71
Activation functions
• Transforms neuron’s input into output.
• Features of activation functions:
• A squashing effect is required
• Prevents accelerating growth of activation
levels through the network.
• Simple and easy to calculate
69. 72
Standard activation functions
• The hard-limiting threshold function
– Corresponds to the biological paradigm
• either fires or not
• Sigmoid functions ('S'-shaped curves)
– The logistic function
– The hyperbolic tangent (symmetrical)
– Both functions have a simple differential
– Only the shape is important
f(x) =
1
1 + e -ax
70. Some learning algorithms we will learn are
• Supervised:
• Adaline, Madaline
• Perceptron
• Back Propagation
• multilayer perceptrons
• Radial Basis Function Networks
• Unsupervised
• Competitive Learning
• Kohenen self organizing map
• Learning vector quantization
• Hebbian learning
71. Neural processing
• Recall:- processing phase for a NN and its objective is to retrieve the
information. The process of computing o for a given x
• Basic forms of neural information processing
• Auto association
• Hetero association
• Classification
72. Neural processing-Autoassociation
• Set of patterns can be stored in
the network
• If a pattern similar to a member of
the stored set is presented, an
association with the input of
closest stored pattern is made
73. Neural Processing- Heteroassociation
• Associations between pairs of
patterns are stored
• Distorted input pattern may cause
correct heteroassociation at the
output
74. Neural processing-Classification
• Set of input patterns is divided
into a number of classes or
categories
• In response to an input pattern
from the set, the classifier is
supposed to recall the
information regarding class
membership of the input pattern.
75. Important terminologies of ANNs
• Weights
• Bias
• Threshold
• Learning rate
• Momentum factor
• Vigilance parameter
• Notations used in ANN
76. Weights
• Each neuron is connected to every other neuron by means of directed
links
• Links are associated with weights
• Weights contain information about the input signal and is
represented as a matrix
• Weight matrix also called connection matrix
77. Weight matrix
W=
1
2
3
.
.
.
.
.
T
T
T
T
n
w
w
w
w
=
11 12 13 1
21 22 23 2
1 2 3
...
...
..................
...................
...
m
m
n n n nm
w w w w
w w w w
w w w w
78. Weights contd…
• wij –is the weight from processing element ”i” (source node) to
processing element “j” (destination node)
X1
1
Xi
Yj
Xn
w1j
wij
wnj
bj
0
0 0 1 1 2 2
0
1
1
....
n
i ij
inj
i
j j j n nj
n
j i ij
i
n
j i ij
inj
i
y xw
x w xw x w x w
w xw
y b xw
79. Activation Functions
• Used to calculate the output response of a neuron.
• Sum of the weighted input signal is applied with an activation to
obtain the response.
• Activation functions can be linear or non linear
• Already dealt
• Identity function
• Single/binary step function
• Discrete/continuous sigmoidal function.
80. Bias
• Bias is like another weight. Its included by adding a component x0=1
to the input vector X.
• X=(1,X1,X2…Xi,…Xn)
• Bias is of two types
• Positive bias: increase the net input
• Negative bias: decrease the net input
81. Why Bias is required?
• The relationship between input and output given by
the equation of straight line y=mx+c
X Y
Input
C(bias)
y=mx+C
82. Threshold
• Set value based upon which the final output of the
network may be calculated
• Used in activation function
• The activation function using threshold can be
defined as
ifnet
ifnet
net
f
1
1
)
(
83. Learning rate
• Denoted by α.
• Used to control the amount of weight adjustment at each step of
training
• Learning rate ranging from 0 to 1 determines the rate of learning in
each time step
84. Learning rate
• The learning rate defines the size of the corrective steps that the model
takes to adjust for errors in each observation.
• A high learning rate shortens the training time, but with lower ultimate
accuracy, while a lower learning rate takes longer, but with the potential for
greater accuracy.
• Optimizations such as Quickprop are primarily aimed at speeding up error
minimization, while other improvements mainly try to increase reliability.
• In order to avoid oscillation inside the network such as alternating
connection weights, and to improve the rate of convergence, refinements
use an adaptive learning rate that increases or decreases as appropriate.
• (From Wikipedia)
85. Learning Rate
• Neural networks are often trained by gradient descent on the
weights. This means at each iteration we use backpropagation to
calculate the derivative of the loss function with respect to each
weight and subtract it from that weight.
However, if you actually try that, the weights will change far too much
each iteration, which will make them “overcorrect” and the loss will
actually increase/diverge. So in practice, people usually multiply each
derivative by a small value called the “learning rate” before they
subtract it from its corresponding weight.
• w1new= w1 + (learning rate)* (derivative of cost function wrt w1)
86. Learning Rate
• Stochastic gradient descent is an optimization algorithm that estimates the
error gradient for the current state of the model using examples from the
training dataset, then updates the weights of the model using the back-
propagation of errors algorithm, referred to as simply backpropagation.
• The amount that the weights are updated during training is referred to as
the step size or the “learning rate.”
• Specifically, the learning rate is a configurable hyperparameter used in the
training of neural networks that has a small positive value, often in the
range between 0.0 and 1.0.
• The learning rate is one of the most important hyper-parameters to tune
for training deep neural networks.
87. Learning Rate
• If the learning rate is low, then training is more reliable, but
optimization will take a lot of time because steps towards the
minimum of the loss function are tiny.
• If the learning rate is high, then training may not converge or even
diverge. Weight changes can be so big that the optimizer overshoots
the minimum and makes the loss worse.
88. Learning Rate –Variable learning rate
• Now another thing is learning rate may not be a constant for all the
layers of a neural network, it may be different for different layers
which avoids problem of vanishing gradient i.e, weights may stop
changing as weight change backpropogates itself to first layer (since
there are lot multiplications of derivatives and these derivatives itself
REACH decimal values < 1 and there products are even smaller if we
observe mathematical analysis of backpropagation of neural networks
and as a result learning will not take place and saturate immaturely)
so we assign variable learning rate to each layer.
89. A systematic approach towards finding the optimal
learning rate
1. start with a high learning rate and steadily decrease it. Changes in
the weight vector must be small in order to reduce oscillations or any
divergence
2. A simple suggestion is to increase learning rate in order to improve
performance and decrease the learning rate in order to worsen the
performance.
3 another method is to double the learning rate until the error values
worsens.
90. A systematic approach towards finding the optimal
learning rate
• Ultimately, we'd like a learning rate which results is a steep decrease
in the network's loss.
• We can observe this by performing a simple experiment where we
gradually increase the learning rate after each mini batch, recording
the loss at each increment.
• This gradual increase can be on either a linear or exponential scale.
91. • For learning rates which are too low, the loss may decrease, but at a
very shallow rate.
• When entering the optimal learning rate zone, you'll observe a quick
drop in the loss function. Increasing the learning rate further will
cause an increase in the loss as the parameter updates cause the loss
to "bounce around" and even diverge from the minima.
• Remember, the best learning rate is associated with the steepest
drop in loss, so we're mainly interested in analyzing the slope of the
plot.
92.
93. Two types of learning
• 1. Sequential or pre-pattern method
• 2. Batch or pre-epoch method.
• In sequential learning a given input is pattern is propagated forward, the
error is determined and back propagated, and the weights are updated.
• In batch learning the weightsare updated only after the entire set of
training network has been presentedto the network. Thus the weight
update is only performed after every epoch.
• If p= patterns in one epoch then
• ∆ w=
1
𝑝 𝑝=1
∞
∆𝑤𝑝
• This method is having smoothing effect.
94. When to stop back propogation ?
• Continue as long as the error for the validation decreses.
• Whenever the error begins to increase the net is starting to memorise
the training patterns and the training is terminated.
95. How to choose hidden neurons
• There are many rule-of-thumb methods for determining an
acceptable number of neurons to use in the hidden layers, such as the
following:
1. The number of hidden neurons should be between the size of the
input layer and the size of the output layer.
2. The number of hidden neurons should be 2/3 the size of the input
layer, plus the size of the output layer.
3. The number of hidden neurons should be less than twice the size of
the input layer.
96. How to choose number of hidden layers
Number of Hidden Layers Result
none
Only capable of representing linear separable
functions or decisions.
1
Can approximate any function that contains a
continuous mapping from one finite space to
another.
2
Can represent an arbitrary decision boundary
to arbitrary accuracy with rational activation
functions and can approximate any smooth
mapping to any accuracy.
>2
Additional layers can learn complex
representations (sort of automatic feature
engineering) for layer layers.
97. Other terminologies
• Momentum factor:
• used for convergence when momentum factor is added to weight updation
process.
• Vigilance parameter:
• Denoted by ρ
• Used to control the degree of similarity required for patterns to be assigned
to the same cluster
99. Hebbian Learning Rule
• The learning signal is equal to the neuron’s output
FEED FORWARD UNSUPERVISED LEARNING
100. Features of Hebbian Learning
• Feedforward unsupervised learning
• “When an axon of a cell A is near enough to exicite a cell B and
repeatedly and persistently takes place in firing it, some growth
process or change takes place in one or both cells increasing the
efficiency”
• If oixj is positive the results is increase in weight else vice versa
101. Perceptron Learning rule
• Learning signal is the difference between the desired
and actual neuron’s response
• Learning is supervised
102. Delta Learning Rule
• Only valid for continuous activation function
• Used in supervised training mode
• Learning signal for this rule is called delta
• The aim of the delta rule is to minimize the error over all training patterns
103. Delta Learning Rule Contd.
Learning rule is derived from the condition of least squared error.
Calculating the gradient vector with respect to wi
Minimization of error requires the weight changes to be in the negative
gradient direction
104. Widrow-Hoff learning Rule
• Also called as least mean square learning rule
• Introduced by Widrow(1962), used in supervised learning
• Independent of the activation function
• Special case of delta learning rule wherein activation function is an
identity function ie f(net)=net
• Minimizes the squared error between the desired output value di and neti
106. Winner-Take-All Learning rule Contd…
• Can be explained for a layer of neurons
• Example of competitive learning and used for
unsupervised network training
• Learning is based on the premise that one of the
neurons in the layer has a maximum response due to
the input x
• This neuron is declared the winner with a weight
109. 112
Neural Network –Weakness and Strengths
• Weakness
• Long training time
• Require a number of parameters typically best determined empirically, e.g., the
network topology or “structure.”
• Poor interpretability: Difficult to interpret the symbolic meaning behind the learned
weights and of “hidden units” in the network
• Strength
• High tolerance to noisy data
• Ability to classify untrained patterns
• Well-suited for continuous-valued inputs and outputs
• Successful on an array of real-world data, e.g., hand-written letters
• Algorithms are inherently parallel
• Techniques have recently been developed for the extraction of rules from trained
neural networks
110. 113
Summary : A Multi-Layer Feed-Forward Neural Network
Output layer
Input layer
Hidden layer
Output vector
Input vector: X
wij
ij
k
i
i
k
j
k
j x
y
y
w
w )
ˆ
( )
(
)
(
)
1
(
111. 114
Summary :How A Multi-Layer Neural Network Works
• The inputs to the network correspond to the attributes measured for each
training tuple
• Inputs are fed simultaneously into the units making up the input layer
• They are then weighted and fed simultaneously to a hidden layer
• The number of hidden layers is arbitrary, although usually only one
• The weighted outputs of the last hidden layer are input to units making up
the output layer, which emits the network's prediction
• The network is feed-forward: None of the weights cycles back to an input
unit or to an output unit of a previous layer
• From a statistical point of view, networks perform nonlinear regression:
Given enough hidden units and enough training samples, they can closely
approximate any function
112. 115
Summary: Defining a Network Topology
• Decide the network topology: Specify # of units in the input
layer, # of hidden layers (if > 1), # of units in each hidden layer,
and # of units in the output layer
• Normalize the input values for each attribute measured in the
training tuples to [0.0—1.0]
• One input unit per domain value, each initialized to 0
• Output, if for classification and more than two classes, one
output unit per class is used
• Once a network has been trained and its accuracy is
unacceptable, repeat the training process with a different
network topology or a different set of initial weights
113. 116
Summary: Backpropagation
• Iteratively process a set of training tuples & compare the network's prediction
with the actual known target value
• For each training tuple, the weights are modified to minimize the mean
squared error between the network's prediction and the actual target value
• Modifications are made in the “backwards” direction: from the output layer,
through each hidden layer down to the first hidden layer, hence
“backpropagation”
• Steps
• Initialize weights to small random numbers, associated with biases
• Propagate the inputs forward (by applying activation function)
• Backpropagate the error (by updating weights and biases)
• Terminating condition (when error is very small, etc.)
114. 117
Summary :Neuron: A Hidden/Output Layer Unit
• An n-dimensional input vector x is mapped into variable y by means of the
scalar product and a nonlinear function mapping
• The inputs to unit are outputs from the previous layer. They are multiplied by
their corresponding weights to form a weighted sum, which is added to the bias
associated with unit. Then a nonlinear activation function is applied to it.
mk
f
weighted
sum
Input
vector x
output y
Activation
function
weight
vector w
w0
w1
wn
x0
x1
xn
)
sign(
y
Example
For
n
0
i
k
i
i x
w m
bias
115. 118
Summary :Efficiency and Interpretability
• Efficiency of backpropagation: Each epoch (one iteration through the training
set) takes O(|D| * w), with |D| tuples and w weights, but # of epochs can be
exponential to n, the number of inputs, in worst case
• For easier comprehension: Rule extraction by network pruning
• Simplify the network structure by removing weighted links that have the
least effect on the trained network
• Then perform link, unit, or activation value clustering
• The set of input and activation values are studied to derive rules
describing the relationship between the input and hidden unit layers
• Sensitivity analysis: assess the impact that a given input variable has on a
network output. The knowledge gained from this analysis can be represented
in rules