10 Backpropagation Algorithm for Neural Networks (1).pptx

1
Data Mining:
Concepts and Techniques
(3rd ed.)
— Chapter 9 —
Classification: Advanced Methods
Jiawei Han, Micheline Kamber, and Jian Pei
University of Illinois at Urbana-Champaign &
Simon Fraser University
©2011 Han, Kamber & Pei. All rights reserved.

2
Chapter 9. Classification: Advanced Methods
• Classification by Backpropagation

3
Classification by Backpropagation
• Backpropagation: A neural network learning algorithm
• Started by psychologists and neurobiologists to develop and
test computational analogues of neurons
• A neural network: A set of connected input/output units where
each connection has a weight associated with it
• During the learning phase, the network learns by adjusting the
weights so as to be able to predict the correct class label of the
input tuples
• Also referred to as connectionist learning due to the
connections between units

4
Neural Network as a Classifier
• Weakness
• Long training time
• Require a number of parameters typically best determined empirically,
e.g., the network topology or “structure.”
• Poor interpretability: Difficult to interpret the symbolic meaning behind
the learned weights and of “hidden units” in the network
• Strength
• High tolerance to noisy data
• Ability to classify untrained patterns
• Well-suited for continuous-valued inputs and outputs
• Successful on an array of real-world data, e.g., hand-written letters
• Algorithms are inherently parallel
• Techniques have recently been developed for the extraction of rules from
trained neural networks

5
Neuron: A Hidden/Output Layer Unit
• An n-dimensional input vector x is mapped into variable y by means of the
scalar product and a nonlinear function mapping
• The inputs to unit are outputs from the previous layer. They are multiplied by
their corresponding weights to form a weighted sum, which is added to the bias
associated with unit. Then a nonlinear activation function is applied to it.
mk
f
weighted
sum
Input
vector x
output y
Activation
function
weight
vector w

w0
w1
wn
x0
x1
xn
)
sign(
y
Example
For
n
0
i
k
i
i x
w m

 

bias

December 29, 2022 Data Mining: Concepts and Techniques 6
A Neuron (= a perceptron)
• The n-dimensional input vector x is mapped into variable y by means of
the scalar product and a nonlinear function mapping
mk
-
f
weighted
sum
Input
vector x
output y
Activation
function
weight
vector w

w0
w1
wn
x0
x1
xn
)
sign(
y
Example
For
n
0
i
k
i
i x
w m

 


A Multi-Layer Feed-Forward Neural Network
Output layer
Input layer
Hidden layer
Output vector
Input vector: X
wij
ij
k
i
i
k
j
k
j x
y
y
w
w )
ˆ
( )
(
)
(
)
1
(






8
Backpropagation
• Iteratively process a set of training tuples & compare the network's prediction
with the actual known target value
• For each training tuple, the weights are modified to minimize the mean
squared error between the network's prediction and the actual target value
• Modifications are made in the “backwards” direction: from the output layer,
through each hidden layer down to the first hidden layer, hence
“backpropagation”
• Steps
• Initialize weights to small random numbers, associated with biases
• Propagate the inputs forward (by applying activation function)
• Backpropagate the error (by updating weights and biases)
• Terminating condition (when error is very small, etc.)

How A Multi-Layer Neural Network Works?
• The inputs to the network correspond to the attributes measured for each
training tuple
• Inputs are fed simultaneously into the units making up the input layer
• They are then weighted and fed simultaneously to a hidden layer
• The number of hidden layers is arbitrary, although usually only one
• The weighted outputs of the last hidden layer are input to units making up
the output layer, which emits the network's prediction
• The network is feed-forward in that none of the weights cycles back to an
input unit or to an output unit of a previous layer
• From a statistical point of view, networks perform nonlinear regression:
Given enough hidden units and enough training samples, they can closely
approximate any function

Sequence of Computations
1. Input and Output Processing at each hidden layer or output layer node:
j is the output layer or hidden layer node and i is a node in the previous layer connected to j
2. Error calculation at each unit j in the output layer:
where Tj is the Target Output while Oj is calculated output of the jth output unit
3. Error computation at a hidden layer unit j:
where wjk is the weight of the connection from unit j to an unit k in the next higher layer, and Errk is the error of unit k
4. Weight and bias updation:
 

i
j
i
ij
j O
w
I 
j
I
j
e
O 


1
1
)
)(
1
( j
j
j
j
j O
T
O
O
Err 


jk
k
k
j
j
j w
Err
O
O
Err 

 )
1
(
i
j
ij
ij O
Err
l
w
w )
(


j
j
j Err
l)
(




Example of Backpropagation Algorithm

Example of Backpropagation Algorithm
Consider a 3-input, one output Neural Network with two hidden layer nodes:
1
2
3
5
4
6
x1
x2
x3
1,2,3 are input nodes, 4 and 5 hidden layer nodes, and 6 is output node
Dotted arrows indicate threshold values
w24
w14
w15
w25
w34
w35
w46
w56
𝜃4
𝜃5
𝜃6
Computed output of the network, O6
To be compared with T6 (the target
output or class label corresponding to
the given input vector (x1, x2, x3))

Feedforward Phase
Input Vector Weights
X1 = 1 w14 = 0.6
X2 = 0 w24 = 0.2 I4= 0.3 O4=0.57443 w46=-0.8
X3 = 1 w34 = -0.4
Bias 𝜃4 = 0.1
Bias 𝜃6 =-0.4 I6=-0.5594 O6=0.3637
Input Vector Weights
X1 = 1 w15 = 0.1
X2 = 0 w25 = 0.7 I5= 1.1 O5=0.75026 w56=0.4
X3 = 1 w35 = 0.3
Bias 𝜃5 = 0.7
 

i
j
i
ij
j O
w
I 
j
I
j
e
O 


1
1  

i
j
i
ij
j O
w
I 
j
I
j
e
O 


1
1

Error at Output Node
)
)(
1
( j
j
j
j
j O
T
O
O
Err 


Err6 = O6(1-O6)(T6 – O6)
Given T6 = 1 corresponding to the given input vector (x1, x2, x3)
Hence Err6 = 0.1473
Let l (learning rate) = 0.4

Backpropagation of Error to Hidden Layer
jk
k
k
j
j
j w
Err
O
O
Err 

 )
1
(
Err4 = O4(1-O4)Err6w46
Err4 = -0.02881
Err5 = O5(1-O5)Err6w56
Err5 = 0.01104
j=4
Error from all output nodes k are
backpropagated to the previous layer neuron j
k
As in our example there is only
one output node k, hence the
summation reduces to a single
term
j=5

Weight and bias updates
• Weight updates will happen at all the edges (connections) which are in the network: from input layer
to hidden layer, and hidden layer to output layer
If there is an edge between any two nodes i and j:
Let l (learning rate) = 0.4
w14 = w14 + l*Err4*O1 = 0.5908 (Note: O1 = x1 = 1, as there is no processing at input node)
w15 = 0.1044, w24 = 0.2 and w25 = 0.7 (no weight updation as O2 = x2 = 0), w34 = -0.41152, w35 = 0.30442,
w46 = -0.76615, w56 = 0.44421
• Bias updates will happen only at nodes where bias is present: hidden and output nodes
𝜃4 = 0.08848, 𝜃5 = 0.70442, 𝜃6 = -0.34108
i
j
ij
ij O
Err
l
w
w )
(


j
j
j Err
l)
(



i j
wij
Errj
Oi

You can play around with the learning rate
and see the variation….
l (learning rate) 0.4
Weights Biases
w14 0.611524 Theta4 0.111524
Node 6 0.147254 w15 0.1044 Theta5 0.704416
w34 -0.38848 Theta6 -0.34108
Node 4 0.028807 w35 0.304416
w46 -0.76615
Node 5 0.01104 w56 0.444205
Errors

Backpropagation of Error to Hidden Layer (general
case where there are more than one output nodes)
jk
k
k
j
j
j w
Err
O
O
Err 

 )
1
(
j
General Architecture of ANN
Error from all output nodes k are
backpropagated to the previous layer neuron j
.
.
.
.
.
.
k
As in our example there is only
one output node k, hence the
summation reduces to a single
term
Errj
Oj
wjk
Errk

More general case..If there was another hidden
layer
k
.
.
.
.
.
.
j
.
.
.
.
.
. .
.
.
.
.
.
.
.
Hidden
Layer H1
Hidden
Layer H2
By the same process the error
computation will happen in H1 after
H2
Error computation sequence:
First output layer (all nodes)
Then H2 (all nodes)
Then H1 (all nodes)
But no error computation in input
nodes (why?)

20
Defining a Network Topology
• Decide the network topology: Specify # of units in the input
layer, # of hidden layers (if > 1), # of units in each hidden layer,
and # of units in the output layer
• Normalize the input values for each attribute measured in the
training tuples to [0.0—1.0]
• One input unit per domain value, each initialized to 0
• Output, if for classification and more than two classes, one
output unit per class is used
• Once a network has been trained and its accuracy is
unacceptable, repeat the training process with a different
network topology or a different set of initial weights

21
Efficiency and Interpretability
• Efficiency of backpropagation: Each epoch (one iteration through the training
set) takes O(|D| * w), with |D| tuples and w weights, but # of epochs can be
exponential to n, the number of inputs, in worst case
• For easier comprehension: Rule extraction by network pruning
• Simplify the network structure by removing weighted links that have the
least effect on the trained network
• Then perform link, unit, or activation value clustering
• The set of input and activation values are studied to derive rules
describing the relationship between the input and hidden unit layers
• Sensitivity analysis: assess the impact that a given input variable has on a
network output. The knowledge gained from this analysis can be represented
in rules

10 Backpropagation Algorithm for Neural Networks (1).pptx

More Related Content

What's hot

Similar to 10 Backpropagation Algorithm for Neural Networks (1).pptx

Recently uploaded

10 Backpropagation Algorithm for Neural Networks (1).pptx