An Introduction to
Artificial Neural Network
Dr Iman Ardekani
Biological Neurons
Modeling Neurons
McCulloch-Pitts Neuron
Network Architecture
Learning Process
Perceptron
Linear Neuron
Multilayer Perceptron
Content
Ramón-y-Cajal (Spanish Scientist, 1852~1934):
1. Brain is composed of individual cells called
neurons.
2. Neurons are connected to each others by
synopses.
Biological Neurons
Neurons Structure (Biology)
Biological Neurons
Dendrites Cell body
Nucleus
Axon
Axon
Terminals
Synaptic Junction (Biology)
Biological Neurons
Synapse
Neurons Function (Biology)
1. Dendrites receive signal from other neurons.
2. Neurons can process (or transfer) the received signals.
3. Axon terminals deliverer the processed signal to other
tissues.
Biological Neurons
What kind of signals? Electrical Impulse Signals
Modeling Neurons (Computer Science)
Biological Neurons
Input Outputs
Process
Modeling Neurons
Net input signal is a linear combination of input signals xi.
Each Output is a function of the net input signal.
Biological Neurons
x1x2
x3
x4
x5
x6
x7
x8
x9
x10
y
- McCulloch and Pitts (1943) for introducing the
idea of neural networks as computing machines
- Hebb (1949) for inventing the first rule for self-
organized learning
- Rosenblass (1958) for proposing the perceptron
as the first model for learning with a teacher
Modelling Neurons
Net input signal received through synaptic junctions is
net = b + Σwixi = b + WT
X
Weight vector: W =[w1 w2 … wm]T
Input vector: X = [x1 x2 … xm]T
Each output is a function of the net stimulus signal (f is called the
activation function)
y = f (net) = f(b + WTX)
Modelling Neurons
General Model for Neurons
Modelling Neurons
y= f(b+ Σwnxn)
x1
x2
xm
.
.
. y
w1
w2
wm
b
 f
Activation functions
Modelling Neurons
1 1 1
1 1 1
-1 -1 -1
Good for classification Simple computation Continuous & Differentiable
Threshold Function/ Hard Limiter Linear Function sigmoid Function
Sigmoid Function
Modelling Neurons
1
sigmoid Function
f(x) =
1
1 + e-ax
= threshold function
when a goes to infinity
McCulloch-Pitts Neuron
Modelling Neurons
y {0,1}
y
x1
x2
xm
.
.
. y
w1
w2
wm
b

Modelling Neurons
y [0,1]

x1
x2
xm
.
.
. y
w1
w2
wm
b
Single-input McCulloch-Pitts neurode with b=0, w1=-1 for binary
inputs:
Conclusion?
McCulloch-Pitts Neuron
x1 net y
0 0 1
1 -1 0 x1 y
w1
b

Two-input McCulloch-Pitts neurode with b=-1,
w1=w2=1 for binary inputs:
McCulloch-Pitts Neuron
x1 x2 net y
0 0 ? ?
0 1 ? ?
1 0 ? ?
1 1 ? ?
x1
y
w1
b

x2
w2
Two-input McCulloch-Pitts neurode with b=-1,
w1=w2=1 for binary inputs:
McCulloch-Pitts Neuron
x1 x2 net y
0 0 -1 0
0 1 0 1
1 0 0 1
1 1 1 1
x1
y
w1
b

x2
w2
Two-input McCulloch-Pitts neurode with b=-2,
w1=w2=1 for binary inputs :
McCulloch-Pitts Neuron
x1 x2 net y
0 0 ? ?
0 1 ? ?
1 0 ? ?
1 1 ? ?
x1
y
w1
b

x2
w2
Two-input McCulloch-Pitts neurode with b=-2,
w1=w2=1 for binary inputs :
McCulloch-Pitts Neuron
x1 x2 net y
0 0 -2 0
0 1 -1 0
1 0 -1 0
1 1 0 1
x1
y
w1
b

x2
w2
Every basic Boolean function can be
implemented using combinations of McCulloch-
Pitts Neurons.
McCulloch-Pitts Neuron
the McCulloch-Pitts neuron can be used as a
classifier that separate the input signals into two
classes (perceptron):
McCulloch-Pitts Neuron
X1
X2
Class A (y=1)
Class B (y=0)
Class A  y=?
y = 1  net = ?
net  0  ?
b+w1x1+w2x2  0
Class B  y=?
y = 0  net = ?
net < 0  ?
b+w1x1+w2x2 < 0
Class A  b+w1x1+w2x2  0
Class B  b+w1x1+w2x2 < 0
Therefore, the decision boundary is a hyperline given by
b+w1x1+w2x2 = 0
Where w1 and w2 come from?
McCulloch-Pitts Neuron
Solution: More Neurons Required
McCulloch-Pitts Neuron
X2
X1
X1 and x2
X2
X1
X1 or x2
X2
X1
X1 xor x2
?
Nonlinear Classification
McCulloch-Pitts Neuron
X1
X2
Class A (y=1)
Class B (y=0)
Single Layer Feed-forward Network
Network Architecture
Single Layer:
There is only one
computational
layer.
Feed-forward:
Input layer projects
to the output layer
not vice versa.
 f
 f
 f
Input layer
(sources)
Output layer
(Neurons)
Multi Layer Feed-forward Network
Network Architecture
 f
 f
 f
Input layer
(sources)
Hidden layer
(Neurons)
 f
 f
Output layer
(Neurons)
Single Layer Recurrent Network
Network Architecture
 f
 f
 f
Multi Layer Recurrent Network
Network Architecture
 f
 f
 f
 f
 f
The mechanism based on which a neural network
can adjust its weights (synaptic junctions weights):
- Supervised learning: having a teacher
- Unsupervised learning: without teacher
Learning Processes
Supervised Learning
Learning Processes
Teacher
Learner
Environment 
Input
data Desired
response
Actual
response
Error Signal
+
-
Unsupervised Learning
Learning Processes
LearnerEnvironment
Input
data
Neurons learn based on a competitive task.
A competition rule is required (competitive-learning rule).
- The goal of the perceptron to classify input data into two classes A and B
- Only when the two classes can be separated by a linear boundary
- The perceptron is built around the McCulloch-Pitts Neuron model
- A linear combiner followed by a hard limiter
- Accordingly the neuron can produce +1 and 0
Perceptron
x1
x2
xm
.
.
. y
wk
wk
wk
b

Equivalent Presentation
Perceptron
x1
x2
xm
.
.
. y
w1
w2
wm
b

w0
1
net = WTX
Weight vector: W =[w0 w1 … wm]T
Input vector: X = [1 x1 x2 … xm]T
There exist a weight vector w such that we may state
WTx > 0 for every input vector x belonging to A
WTx ≤ 0 for every input vector x belonging to B
Perceptron
Elementary perceptron Learning Algorithm
1) Initiation: w(0) = a random weight vector
2) At time index n, form the input vector x(n)
3) IF (wTx > 0 and x belongs to A) or (wTx ≤ 0 and x
belongs to B) THEN w(n)=w(n-1) Otherwise
w(n)=w(n-1)-ηx(n)
4) Repeat 2 until w(n) converges
1)
Perceptron
- when the activation function simply is f(x)=x the
neuron acts similar to an adaptive filter.
- In this case: y=net=wTx
- w=[w1 w2 … wm]T
- x=[x1 x2 … xm]T
Linear Neuron
x1
x2
xm
.
.
. y
w1
w2
wm
b

Linear Neuron Learning (Adaptive Filtering)
Linear Neuron
Teacher
Learner (Linear
Neuron)
Environment 
Input
data Desired
Response
d(n)
Actual
Response
y(n)
Error Signal
+
-
LMS Algorithm
To minimize the value of the cost function defined as
E(w) = 0.5 e2(n)
where e(n) is the error signal
e(n)=d(n)-y(n)=d(n)-wT(n)x(n)
In this case, the weight vector can be updated as follows
wi(n+1)=wi(n-1) - μ( )
Linear Neuron
dE
dwi
LMS Algorithm (continued)
= e(n) = e(n) {d(n)-wT(n)x(n)}
= -e(n) xi(n)
wi(n+1)=wi(n)+μe(n)xi(n)
w(n+1)=w(n)+μe(n)x(n)
Linear Neuron
dE
dwi
de(n)
dwi
d
dwi
Summary of LMS Algorithm
1) Initiation: w(0) = a random weight vector
2) At time index n, form the input vector x(n)
3) y(n)=wT(n)x(n)
4) e(n)=d(n)-y(n)
5) w(n+1)=w(n)+μe(n)x(n)
6) Repeat 2 until w(n) converges
Linear Neuron
- To solve the xor problem
- To solve the nonlinear classification problem
- To deal with more complex problems
Multilayer Perceptron
- The activation function in multilayer perceptron is
usually a Sigmoid function.
- Because Sigmoid is differentiable function, unlike
the hard limiter function used in the elementary
perceptron.
Multilayer Perceptron
Architecture
Multilayer Perceptron
 f
 f
 f
Input layer
(sources)
Hidden layer
(Neurons)
 f
 f
Output layer
(Neurons)
Architecture
Multilayer Perceptron
 f
 f
 f
Inputs (from layer k-1) layer k Outputs (to layer k+1)
Consider a single neuron in a multilayer perceptron (neuron k)
Multilayer Perceptron
 f
yk
y0=1
y1
ym
Wk,0
Wk,1
Wk,m
.
.
.
netk = Σwk,iyi
yk= f (netk)
Multilayer perceptron Learning (Back Propagation Algorithm)
Multilayer Perceptron
Teacher
Neuron j
of layer k
layer k-1 
Input
data Desired
Response
dk(n)
Actual
Response
yk(n)
Error Signal
ek(n)
+
-
Back Propagation Algorithm:
Cost function of neuron j:
Ek = 0.5 e2
j
Cost function of network:
E = ΣEj = 0.5Σe2
j
ek = dk – yk
ek = dk – f(netk)
ek = dk – f(Σwk,iyi)
Multilayer Perceptron
Back Propagation Algorithm:
Cost function of neuron k:
Ek = 0.5 e2
k
ek = dk – yk
ek = dk – f(netk)
ek = dk – f(Σwk,iyi)
Multilayer Perceptron
Cost function of network:
E = ΣEj = 0.5Σe2
j
Multilayer Perceptron
Back Propagation Algorithm:
To minimize the value of the cost function E(wk,i), the
weight vector can be updated using a gradient based
algorithm as follows
wk,i(n+1)=wk,i (n) - μ( )
=?
Multilayer Perceptron
dE
dwk,i
dE
dwk,j
Back Propagation Algorithm:
= =
= δk yk
δk = = -ekf’(netk)
Multilayer Perceptron
dE
dwk,i
dE
dek
dek
dyk
dyk
dnetk
dnetk
dwk,i
dE
dnetk
dnetk
dwk,i
dE
dek
dek
dyk
dyk
dnetk
dE
dek
= ek
dek
dnetk
= f’(netk)
dek
dyk
= -1
dnetk
dwk,i
= yi
Local Gradient
Back Propagation Algorithm:
Substituting into the gradient-based algorithm:
wk,i(n+1)=wk,i (n) – μ δk yk
δk = -ekf’(netk) = - {dk- yk} f’(netk) = ?
If k is an output neuron we have all the terms of δk
When k is a hidden neuron?
Multilayer Perceptron
dE
dwk,i
Back Propagation Algorithm:
When k is hidden
δk = =
= f’(netk)
=?
Multilayer Perceptron
dE
dek
dek
dyk
dyk
dnetk
dE
dyk
dyk
dnetk
dek
dnetk
= f’(netk)
dE
dyk
dE
dyk
= ?
E = 0.5Σe2
j
Multilayer Perceptron
dE
dyk
dE
dyk
dej
dyk
dej
dnetj
dnetj
dyk
wjk
-f’(netj)
= -Σ ej f’(netj) wjk = -Σ δj wjk
= Σ ej
= Σ ej
We had δk =-
Substituting = Σ δj wjk into δk results in
δk = - f’(netk) Σ δj wjk
which gives the local gradient for the hidden neuron k
Multilayer Perceptron
dE
dyk
f’(netk)
dE
dyk
Summary of Back Propagation Algorithm
1) Initiation: w(0) = a random weight vector
At time index n, get the input data and
2) Calculate netk and yk for all the neurons
3) For output neurons, calculate ek=dk-yk
4) For output neurons, δk = -ekf’(netk)
5) For hidden neurons, δk = - f’(netk) Σ δj wjk
6) Update every neuron weights by
wk,i(NEW)=wk,i (OLD) – μ δk yk
7. Repeat steps 2~6 for the next data set (next time index)
Multilayer Perceptron
THE END

Artificial Neural Network

  • 1.
    An Introduction to ArtificialNeural Network Dr Iman Ardekani
  • 2.
    Biological Neurons Modeling Neurons McCulloch-PittsNeuron Network Architecture Learning Process Perceptron Linear Neuron Multilayer Perceptron Content
  • 3.
    Ramón-y-Cajal (Spanish Scientist,1852~1934): 1. Brain is composed of individual cells called neurons. 2. Neurons are connected to each others by synopses. Biological Neurons
  • 4.
    Neurons Structure (Biology) BiologicalNeurons Dendrites Cell body Nucleus Axon Axon Terminals
  • 5.
  • 6.
    Neurons Function (Biology) 1.Dendrites receive signal from other neurons. 2. Neurons can process (or transfer) the received signals. 3. Axon terminals deliverer the processed signal to other tissues. Biological Neurons What kind of signals? Electrical Impulse Signals
  • 7.
    Modeling Neurons (ComputerScience) Biological Neurons Input Outputs Process
  • 8.
    Modeling Neurons Net inputsignal is a linear combination of input signals xi. Each Output is a function of the net input signal. Biological Neurons x1x2 x3 x4 x5 x6 x7 x8 x9 x10 y
  • 9.
    - McCulloch andPitts (1943) for introducing the idea of neural networks as computing machines - Hebb (1949) for inventing the first rule for self- organized learning - Rosenblass (1958) for proposing the perceptron as the first model for learning with a teacher Modelling Neurons
  • 10.
    Net input signalreceived through synaptic junctions is net = b + Σwixi = b + WT X Weight vector: W =[w1 w2 … wm]T Input vector: X = [x1 x2 … xm]T Each output is a function of the net stimulus signal (f is called the activation function) y = f (net) = f(b + WTX) Modelling Neurons
  • 11.
    General Model forNeurons Modelling Neurons y= f(b+ Σwnxn) x1 x2 xm . . . y w1 w2 wm b  f
  • 12.
    Activation functions Modelling Neurons 11 1 1 1 1 -1 -1 -1 Good for classification Simple computation Continuous & Differentiable Threshold Function/ Hard Limiter Linear Function sigmoid Function
  • 13.
    Sigmoid Function Modelling Neurons 1 sigmoidFunction f(x) = 1 1 + e-ax = threshold function when a goes to infinity
  • 14.
    McCulloch-Pitts Neuron Modelling Neurons y{0,1} y x1 x2 xm . . . y w1 w2 wm b 
  • 15.
  • 16.
    Single-input McCulloch-Pitts neurodewith b=0, w1=-1 for binary inputs: Conclusion? McCulloch-Pitts Neuron x1 net y 0 0 1 1 -1 0 x1 y w1 b 
  • 17.
    Two-input McCulloch-Pitts neurodewith b=-1, w1=w2=1 for binary inputs: McCulloch-Pitts Neuron x1 x2 net y 0 0 ? ? 0 1 ? ? 1 0 ? ? 1 1 ? ? x1 y w1 b  x2 w2
  • 18.
    Two-input McCulloch-Pitts neurodewith b=-1, w1=w2=1 for binary inputs: McCulloch-Pitts Neuron x1 x2 net y 0 0 -1 0 0 1 0 1 1 0 0 1 1 1 1 1 x1 y w1 b  x2 w2
  • 19.
    Two-input McCulloch-Pitts neurodewith b=-2, w1=w2=1 for binary inputs : McCulloch-Pitts Neuron x1 x2 net y 0 0 ? ? 0 1 ? ? 1 0 ? ? 1 1 ? ? x1 y w1 b  x2 w2
  • 20.
    Two-input McCulloch-Pitts neurodewith b=-2, w1=w2=1 for binary inputs : McCulloch-Pitts Neuron x1 x2 net y 0 0 -2 0 0 1 -1 0 1 0 -1 0 1 1 0 1 x1 y w1 b  x2 w2
  • 21.
    Every basic Booleanfunction can be implemented using combinations of McCulloch- Pitts Neurons. McCulloch-Pitts Neuron
  • 22.
    the McCulloch-Pitts neuroncan be used as a classifier that separate the input signals into two classes (perceptron): McCulloch-Pitts Neuron X1 X2 Class A (y=1) Class B (y=0) Class A  y=? y = 1  net = ? net  0  ? b+w1x1+w2x2  0 Class B  y=? y = 0  net = ? net < 0  ? b+w1x1+w2x2 < 0
  • 23.
    Class A b+w1x1+w2x2  0 Class B  b+w1x1+w2x2 < 0 Therefore, the decision boundary is a hyperline given by b+w1x1+w2x2 = 0 Where w1 and w2 come from? McCulloch-Pitts Neuron
  • 24.
    Solution: More NeuronsRequired McCulloch-Pitts Neuron X2 X1 X1 and x2 X2 X1 X1 or x2 X2 X1 X1 xor x2 ?
  • 25.
  • 26.
    Single Layer Feed-forwardNetwork Network Architecture Single Layer: There is only one computational layer. Feed-forward: Input layer projects to the output layer not vice versa.  f  f  f Input layer (sources) Output layer (Neurons)
  • 27.
    Multi Layer Feed-forwardNetwork Network Architecture  f  f  f Input layer (sources) Hidden layer (Neurons)  f  f Output layer (Neurons)
  • 28.
    Single Layer RecurrentNetwork Network Architecture  f  f  f
  • 29.
    Multi Layer RecurrentNetwork Network Architecture  f  f  f  f  f
  • 30.
    The mechanism basedon which a neural network can adjust its weights (synaptic junctions weights): - Supervised learning: having a teacher - Unsupervised learning: without teacher Learning Processes
  • 31.
    Supervised Learning Learning Processes Teacher Learner Environment Input data Desired response Actual response Error Signal + -
  • 32.
    Unsupervised Learning Learning Processes LearnerEnvironment Input data Neuronslearn based on a competitive task. A competition rule is required (competitive-learning rule).
  • 33.
    - The goalof the perceptron to classify input data into two classes A and B - Only when the two classes can be separated by a linear boundary - The perceptron is built around the McCulloch-Pitts Neuron model - A linear combiner followed by a hard limiter - Accordingly the neuron can produce +1 and 0 Perceptron x1 x2 xm . . . y wk wk wk b 
  • 34.
    Equivalent Presentation Perceptron x1 x2 xm . . . y w1 w2 wm b  w0 1 net= WTX Weight vector: W =[w0 w1 … wm]T Input vector: X = [1 x1 x2 … xm]T
  • 35.
    There exist aweight vector w such that we may state WTx > 0 for every input vector x belonging to A WTx ≤ 0 for every input vector x belonging to B Perceptron
  • 36.
    Elementary perceptron LearningAlgorithm 1) Initiation: w(0) = a random weight vector 2) At time index n, form the input vector x(n) 3) IF (wTx > 0 and x belongs to A) or (wTx ≤ 0 and x belongs to B) THEN w(n)=w(n-1) Otherwise w(n)=w(n-1)-ηx(n) 4) Repeat 2 until w(n) converges 1) Perceptron
  • 37.
    - when theactivation function simply is f(x)=x the neuron acts similar to an adaptive filter. - In this case: y=net=wTx - w=[w1 w2 … wm]T - x=[x1 x2 … xm]T Linear Neuron x1 x2 xm . . . y w1 w2 wm b 
  • 38.
    Linear Neuron Learning(Adaptive Filtering) Linear Neuron Teacher Learner (Linear Neuron) Environment  Input data Desired Response d(n) Actual Response y(n) Error Signal + -
  • 39.
    LMS Algorithm To minimizethe value of the cost function defined as E(w) = 0.5 e2(n) where e(n) is the error signal e(n)=d(n)-y(n)=d(n)-wT(n)x(n) In this case, the weight vector can be updated as follows wi(n+1)=wi(n-1) - μ( ) Linear Neuron dE dwi
  • 40.
    LMS Algorithm (continued) =e(n) = e(n) {d(n)-wT(n)x(n)} = -e(n) xi(n) wi(n+1)=wi(n)+μe(n)xi(n) w(n+1)=w(n)+μe(n)x(n) Linear Neuron dE dwi de(n) dwi d dwi
  • 41.
    Summary of LMSAlgorithm 1) Initiation: w(0) = a random weight vector 2) At time index n, form the input vector x(n) 3) y(n)=wT(n)x(n) 4) e(n)=d(n)-y(n) 5) w(n+1)=w(n)+μe(n)x(n) 6) Repeat 2 until w(n) converges Linear Neuron
  • 42.
    - To solvethe xor problem - To solve the nonlinear classification problem - To deal with more complex problems Multilayer Perceptron
  • 43.
    - The activationfunction in multilayer perceptron is usually a Sigmoid function. - Because Sigmoid is differentiable function, unlike the hard limiter function used in the elementary perceptron. Multilayer Perceptron
  • 44.
    Architecture Multilayer Perceptron  f f  f Input layer (sources) Hidden layer (Neurons)  f  f Output layer (Neurons)
  • 45.
    Architecture Multilayer Perceptron  f f  f Inputs (from layer k-1) layer k Outputs (to layer k+1)
  • 46.
    Consider a singleneuron in a multilayer perceptron (neuron k) Multilayer Perceptron  f yk y0=1 y1 ym Wk,0 Wk,1 Wk,m . . . netk = Σwk,iyi yk= f (netk)
  • 47.
    Multilayer perceptron Learning(Back Propagation Algorithm) Multilayer Perceptron Teacher Neuron j of layer k layer k-1  Input data Desired Response dk(n) Actual Response yk(n) Error Signal ek(n) + -
  • 48.
    Back Propagation Algorithm: Costfunction of neuron j: Ek = 0.5 e2 j Cost function of network: E = ΣEj = 0.5Σe2 j ek = dk – yk ek = dk – f(netk) ek = dk – f(Σwk,iyi) Multilayer Perceptron
  • 49.
    Back Propagation Algorithm: Costfunction of neuron k: Ek = 0.5 e2 k ek = dk – yk ek = dk – f(netk) ek = dk – f(Σwk,iyi) Multilayer Perceptron
  • 50.
    Cost function ofnetwork: E = ΣEj = 0.5Σe2 j Multilayer Perceptron
  • 51.
    Back Propagation Algorithm: Tominimize the value of the cost function E(wk,i), the weight vector can be updated using a gradient based algorithm as follows wk,i(n+1)=wk,i (n) - μ( ) =? Multilayer Perceptron dE dwk,i dE dwk,j
  • 52.
    Back Propagation Algorithm: == = δk yk δk = = -ekf’(netk) Multilayer Perceptron dE dwk,i dE dek dek dyk dyk dnetk dnetk dwk,i dE dnetk dnetk dwk,i dE dek dek dyk dyk dnetk dE dek = ek dek dnetk = f’(netk) dek dyk = -1 dnetk dwk,i = yi Local Gradient
  • 53.
    Back Propagation Algorithm: Substitutinginto the gradient-based algorithm: wk,i(n+1)=wk,i (n) – μ δk yk δk = -ekf’(netk) = - {dk- yk} f’(netk) = ? If k is an output neuron we have all the terms of δk When k is a hidden neuron? Multilayer Perceptron dE dwk,i
  • 54.
    Back Propagation Algorithm: Whenk is hidden δk = = = f’(netk) =? Multilayer Perceptron dE dek dek dyk dyk dnetk dE dyk dyk dnetk dek dnetk = f’(netk) dE dyk dE dyk
  • 55.
    = ? E =0.5Σe2 j Multilayer Perceptron dE dyk dE dyk dej dyk dej dnetj dnetj dyk wjk -f’(netj) = -Σ ej f’(netj) wjk = -Σ δj wjk = Σ ej = Σ ej
  • 56.
    We had δk=- Substituting = Σ δj wjk into δk results in δk = - f’(netk) Σ δj wjk which gives the local gradient for the hidden neuron k Multilayer Perceptron dE dyk f’(netk) dE dyk
  • 57.
    Summary of BackPropagation Algorithm 1) Initiation: w(0) = a random weight vector At time index n, get the input data and 2) Calculate netk and yk for all the neurons 3) For output neurons, calculate ek=dk-yk 4) For output neurons, δk = -ekf’(netk) 5) For hidden neurons, δk = - f’(netk) Σ δj wjk 6) Update every neuron weights by wk,i(NEW)=wk,i (OLD) – μ δk yk 7. Repeat steps 2~6 for the next data set (next time index) Multilayer Perceptron
  • 58.

Editor's Notes

  • #34 - Historically important