cs621-lect18-feedforward-network-contd-2009-9-24.ppt

CS621: Artificial Intelligence
Lecture 18: Feedforward network
contd
Pushpak Bhattacharyya
Computer Science and Engineering
Department
IIT Bombay

Pocket Algorithm
• Algorithm evolved in 1985 – essentially uses
PTA
• Basic Idea:
 Always preserve the best weight obtained so far
in the “pocket”
 Change weights, if found better (i.e. changed
weights result in reduced error).

XOR using 2 layers
  
)))
),
(
(
)),
(
,
(
( 2
1
2
1
2
1
2
1
2
1
x
x
NOT
AND
x
NOT
x
AND
OR
x
x
x
x
x
x



• Non-LS function expressed as a linearly separable
function of individual linearly separable functions.

Example - XOR
x1 x2 x1x
2
0 0 0
0 1 1
1 0 0
1 1 0
w2=1.5
w1=-1
θ = 1
x1 x2 








2
1
1
2
0
w
w
w
w
 Calculation of XOR
Calculation of x1x2
w2=1
w1=1
θ = 0.5
x1x2 x1x2

Example - XOR
w2=1
w1=1
θ = 0.5
x1x2 x1x2
-1
x1 x2
-1
1.5
1.5
1 1

Some Terminology
• A multilayer feedforward neural network has
– Input layer
– Output layer
– Hidden layer (asserts computation)
Output units and hidden units are called
computation units.

Training of the MLP
• Multilayer Perceptron (MLP)
• Question:- How to find weights for the hidden
layers when no target output is available?
• Credit assignment problem – to be solved by
“Gradient Descent”

Gradient Descent Technique
• Let E be the error at the output layer
• ti = target output; oi = observed output
• i is the index going over n neurons in the outermost
layer
• j is the index going over the p patterns (1 to p)
• Ex: XOR:– p=4 and n=1

 


p
j
n
i
j
i
i o
t
E
1 1
2
)
(
2
1

Weights in a ff NN
• wmn is the weight of the
connection from the nth neuron to
the mth neuron
• E vs surface is a complex
surface in the space defined by the
weights wij
• gives the direction in which a
movement of the operating point
in the wmn co-ordinate space will
result in maximum decrease in
error
W
m
n
wmn
mn
w
E



mn
mn
w
E
w






Sigmoid neurons
• Gradient Descent needs a derivative computation
- not possible in perceptron due to the discontinuous step
function used!
 Sigmoid neurons with easy-to-compute derivatives used!
• Computing power comes from non-linearity of sigmoid
function.






x
y
x
y
as
0
as
1

Derivative of Sigmoid function
)
1
(
1
1
1
1
1
)
1
(
)
(
)
1
(
1
1
1
2
2
y
y
e
e
e
e
e
e
dx
dy
e
y
x
x
x
x
x
x
x




























Training algorithm
• Initialize weights to random values.
• For input x = <xn,xn-1,…,x0>, modify weights as follows
Target output = t, Observed output = o
• Iterate until E <  (threshold)
i
i
w
E
w





2
)
(
2
1
o
t
E 


Calculation of ∆wi
i
i
i
i
i
i
n
i
i
i
i
i
x
o
o
o
t
w
w
E
w
x
o
o
o
t
W
net
net
o
o
E
x
w
net
where
W
net
net
E
W
E
)
1
(
)
(
)
1
0
constant,
learning
(
)
1
(
)
(
:
1
0

























 





















Observations
Does the training technique support our
intuition?
• The larger the xi, larger is ∆wi
– Error burden is borne by the weight values
corresponding to large input values

Backpropagation on feedforward
network

Backpropagation algorithm
• Fully connected feed forward network
• Pure FF network (no jumping of connections
over layers)
Hidden layers
Input layer
(n i/p neurons)
Output layer
(m o/p neurons)
j
i
wji
….
….
….
….

Gradient Descent Equations
i
ji
j
ji
j
th
j
ji
j
j
ji
ji
ji
jo
w
net
j
w
j
net
E
net
w
net
net
E
w
E
w
E
w
































)
layer
j
at the
input
(
)
1
0
rate,
learning
(

Backpropagation – for outermost
layer
i
j
j
j
j
ji
j
j
j
j
m
p
p
p
th
j
j
j
j
j
o
o
o
o
t
w
o
o
o
t
j
o
t
E
net
net
o
o
E
net
E
j
)
1
(
)
(
))
1
(
)
(
(
Hence,
)
(
2
1
)
layer
j
at the
input
(
1
2





























Backpropagation for hidden layers
Hidden layers
Input layer
(n i/p neurons)
Output layer
(m o/p neurons)
j
i
….
….
….
….
k
k is propagated backwards to find value of j

Backpropagation – for hidden
layers
i
j
j
k
k
kj
j
j
k
kj
k
j
j
j
k j
k
j
j
j
j
j
j
j
i
ji
o
o
o
w
o
o
w
o
o
o
netk
net
E
o
o
o
E
net
o
o
E
net
E
j
jo
w
)
1
(
)
(
)
1
(
)
(
Hence,
)
1
(
)
(
)
1
(
layer
next
layer
next
layer
next
















































General Backpropagation Rule
i
j
j
k
k
kj o
o
o
w )
1
(
)
(
layer
next

 


)
1
(
)
( j
j
j
j
j o
o
o
t 



i
ji jo
w 


• General weight updating rule:
• Where
for outermost layer
for hidden layers

How does it work?
• Input propagation forward and error
propagation backward (e.g. XOR)
w2=1
w1=1
θ = 0.5
x1x2 x1x2
-1
x1 x2
-1
1.5
1.5
1 1

cs621-lect18-feedforward-network-contd-2009-9-24.ppt

Recommended

Recommended

More Related Content

Similar to cs621-lect18-feedforward-network-contd-2009-9-24.ppt

Similar to cs621-lect18-feedforward-network-contd-2009-9-24.ppt (20)

More from GayathriRHICETCSESTA

More from GayathriRHICETCSESTA (20)

Recently uploaded

Recently uploaded (20)

cs621-lect18-feedforward-network-contd-2009-9-24.ppt