SlideShare a Scribd company logo
1 of 36
Download to read offline
Introduction to
Neural Networks Computing
CMSC491N/691N, Spring 2001
Notations
units:
activation/output:
if is an input unit,
for other units ,
where f( .) is the activation function for
weights:
from unit i to unit j (other books use )
j
i y
x ,
j
i Y
X ,
i
X signal
input
=
i
x
j
Y )
_
( j
j in
y
f
y =
j
Y
ji
w
ij
w
i
X j
Y
ij
w
bias: ( a constant input)
threshold: (for units with step/threshold
activation function)
weight matrix: W={ }
i: row index; j: column index
j
b
j
θ
θ
ij
w
0 5 2 ( ) row vectors
3 0 4 ( )
1 6 -1 ( )
column vectors
vectors of weights:
weights come into unit j
weights go out of unit i
)
( 3
,
2
,
1 j
j
j
j w
w
w
w L
=
•
)
( 3
,
2
,
1 i
i
i
i w
w
w
w L
=
•
•
1
w
•
2
w
•
3
w
1
•
w 2
•
w 3
•
w
1 2 3
2
1
5
3
4
6
-1
small
usually
of
scale
the
specifies
rate
learning
:
n)
computatio
vector(for
input
)
......
,
(
vector
put
target)out
(or
training
)
......
,
(
vector
input
training
)
.......
(
}
{
raining
learning/t
)
(
)
(
,
2
1
2
2
,
1
ij
n
m
n
ij
ij
ij
ij
w
x
x
x
x
t
t
t
t
s
s
s
s
w
W
old
w
new
w
w
∆
=
=
=
∆
=
∆
−
=
∆
α
α
α
α
Review of Matrix Operations
Vector: a sequence of elements (the order is important)
e.g., x=(2, 1) denotes a vector
length = sqrt(2*2+1*1)
orientation angle = a
x=(x1, x2, ……, xn), an n dimensional vector
a point on an n dimensional space
column vector: row vector














=
8
5
2
1
x
a
T
x
y =
= )
8
5
2
1
(
x
x
T
T
=
)
(
X (2, 1)
transpose
norms of a vector: (magnitude)
vector operations:
i
n
i
i
n
i
i
n
i
x
x
norm
L
x
x
norm
L
x
x
norm
L
max
)
(
1
2
/
1
2
1
2
2
1
1
1
≤
≤
∞
∞
=
=
=
∑
=
∑
=
x
y
x
x
x
y
y
y
y
x
y
y
y
x
x
x
y
x
n
y
x
x
r
rx
rx
rx
rx
T
n
n
i
i
n
i
n
n
T
T
n
•
=












=
=












=
•
=
Σ
=
M
M
)
...
,
(
)
......
,
(
dimension
same
of
vectors
column
are
,
product
)
dot
(
inner
vector
column
a
:
,
scaler
a
:
)
,......
,
(
2
1
2
1
1
2
1
2
1
2
1
Cross product:
defines another vector orthogonal to the plan
formed by x and y.
y
x ×
the element on the ith row and jth column
a diagonal element
a weight in a weight matrix W
each row or column is a vector
jth column vector
ith row vector
n
m
j
i
mn
m
m
n
n
m a
a
a
a
a
a
a
A ×
×
=










= }
{
......
......
2
1
1
12
11
M








=
=
•
•
•
•
•
•
m
n
n
m
i
j
a
a
a
a
A
a
a
M
)
......
(
:
:
1
1
x
:
:
:
ij
ii
ij
w
a
a
a column vector of dimension m is a matrix of mx1
transpose:
jth column becomes jth row
square matrix:
identity matrix:










=
×
mn
n
n
m
T
n
m
a
a
a
a
a
a
A
......
......
2
1
1
21
11
n
n
A ×


 =
=










=
otherwise
j
i
if
a
I j
i
0
1
1
......
0
0
0
......
1
0
0
.....
0
1
symmetric matrix: m = n
matrix operations:
The result is a row vector, each element of which is
an inner product of and a column vector
ji
ij
i
i
T
a
a
ij
or
a
a
i
or
A
A =
∀
=
∀
= •
•
,
,
)
(
)
,......
( 1 j
i
n ra
ra
ra
rA =
= •
•
)
,......
(
)
,......
)(
......
(
1
1
1
n
T
T
n
m
n
m
T
a
x
a
x
a
a
x
x
A
x
•
•
•
•
×
=
=
T
x j
a •
product of two matrices:
vector outer product:
j
i
ij
p
m
p
n
n
m b
a
C
where
C
B
A •
•
×
×
×
•
=
=
×
n
m
n
n
n
m A
I
A ×
×
×
=
×
( )














=














=
⋅
n
m
m
m
n
n
m
i
T
y
x
y
x
y
x
y
x
y
x
y
x
y
y
x
x
x
y
x
......
,
,
,......
,
......
2
1
1
2
1
1
1
1
1
M
M
M
Calculus and Differential Equations
• , the derivative of , with respect to time
• System of differential equations
solution:
difficult to solve unless are simple
(t)
i
x
&





=
=
)
(
)
(
)
(
)
( 1
1
t
f
t
x
t
f
t
x
n
n
&
M
&
i
x t
))
(
),
(
( 1 t
x
t
x n
L
)
( t
f i
• Multi-variable calculus:
partial derivative: gives the direction and speed of
change of y, with respect to
))
(
......
),
(
),
(
(
)
( 2
1 t
x
t
x
t
x
f
t
y n
=
)
(
3
)
(
2
2
)
(
1
1
)
(
2
2
1
3
2
1
3
2
1
3
2
1
3
2
1
2
)
cos(
)
sin(
x
x
x
x
x
x
x
x
x
x
x
x
e
x
y
e
x
x
y
e
x
x
y
e
x
x
y
+
+
−
+
+
−
+
+
−
+
+
−
−
=
∂
∂
−
=
∂
∂
−
=
∂
∂
+
+
=
i
x
the total derivative:
Gradient of f :
Chain-rule: y is a function of , is a function of t
T
n
t
x
t
x
t
x
t
x
t
y
n
f
n
x
f
x
f
dt
df
)
......
1
(
......
1
)
(
)
(
)
(
)
(
)
(
1
&
&
&
&
&
•
∇
=
∂
∂
+
∂
∂
=
=
)
......
,
(
1 n
x
f
x
f
f
∂
∂
∂
∂
=
∇
))
(
......
),
(
),
(
(
)
( 2
1 t
x
t
x
t
x
f
t
y n
=
i
x
i
x
dynamic system:
– change of may potentially affect other x
– all continue to change (the system evolves)
– reaches equilibrium when
– stability/attraction: special equilibrium point
(minimal energy state)
– pattern of at a stable state often
represents a solution









=
=
)
......
,
(
)
.....
,
(
1
1
1
1
)
(
)
(
n
n
n
x
x
f
n
x
x
f
t
x
t
x
&
&
M
i
x
i
x i ∀
= 0
&
)
......
,
( 1 n
x
x
i
x
Chapter 2: Simple Neural Networks
for Pattern Classification
• General discussion
• Linear separability
• Hebb nets
• Perceptron
• Adaline
General discussion
• Pattern recognition
– Patterns: images, personal records, driving habits, etc.
– Represented as a vector of features (encoded as
integers or real numbers in NN)
– Pattern classification:
• Classify a pattern to one of the given classes
• Form pattern classes
– Pattern associative recall
• Using a pattern to recall a related pattern
• Pattern completion: using a partial pattern to recall the
whole pattern
• Pattern recovery: deals with noise, distortion, missing
information
• General architecture
Single layer
net input to Y:
bias b is treated as the weight from a special unit with
constant output 1.
threshold related to Y
output
classify into one of the two classes
∑
=
+
=
n
i
i
i w
x
b
net
1



<
≥
=
=
θ
θ
θ
θ
net
net
net
f
y
if
1
-
if
1
)
(
θ
θ
Y
xn
x1 1
w
n
w
1
b
)
......
,
( 1 n
x
x
• Decision region/boundary
n = 2, b != 0, θ = 0
is a line, called decision boundary, which partitions the
plane into two decision regions
If a point/pattern is in the positive region, then
, and the output is one (belongs to
class one)
Otherwise, , output –1 (belongs to
class two)
n = 2, b = 0, θ != 0 would result a similar partition
2
1
2
1
2
2
2
1
1 or
0
w
b
x
w
w
x
w
x
w
x
b
−
−
=
=
+
+
2
x
1
x
+
-
)
,
( 2
1 x
x
0
2
2
1
1 ≥
+
+ w
x
w
x
b
0
2
2
1
1 <
+
+ w
x
w
x
b
– If n = 3 (three input units), then the decision
boundary is a two dimensional plane in a three
dimensional space
– In general, a decision boundary is a
n-1 dimensional hyper-plane in an n dimensional
space, which partition the space into two decision
regions
– This simple network thus can classify a given pattern
into one of the two classes, provided one of these two
classes is entirely in one decision region (one side of
the decision boundary) and the other class is in
another region.
– The decision boundary is determined completely by
the weights W and the bias b (or threshold θ
θ).
0
1
=
+ ∑ =
n
i i
i w
x
b
Linear Separability Problem
• If two classes of patterns can be separated by a decision boundary,
represented by the linear equation
then they are said to be linearly separable. The simple network can
correctly classify any patterns.
• Decision boundary (i.e., W, b or θ
θ) of linearly separable classes can
be determined either by some learning procedures or by solving
linear equation systems based on representative patterns of each
classes
• If such a decision boundary does not exist, then the two classes are
said to be linearly inseparable.
• Linearly inseparable problems cannot be solved by the simple
network , more sophisticated architecture is needed.
0
1
=
+ ∑ =
n
i i
i w
x
b
• Examples of linearly separable classes
- Logical AND function
patterns (bipolar) decision boundary
x1 x2 y w1 = 1
-1 -1 -1 w2 = 1
-1 1 -1 b = -1
1 -1 -1 θ = 0
1 1 1 -1 + x1 + x2 = 0
- Logical OR function
patterns (bipolar) decision boundary
x1 x2 y w1 = 1
-1 -1 -1 w2 = 1
-1 1 1 b = 1
1 -1 1 θ = 0
1 1 1 1 + x1 + x2 = 0
x
o
o
o
x: class I (y = 1)
o: class II (y = -1)
x
x
o
x
x: class I (y = 1)
o: class II (y = -1)
• Examples of linearly inseparable classes
- Logical XOR (exclusive OR) function
patterns (bipolar) decision boundary
x1 x2 y
-1 -1 -1
-1 1 1
1 -1 1
1 1 -1
No line can separate these two classes, as can be seen from
the fact that the following linear inequality system has no
solution
because we have b < 0 from
(1) + (4), and b >= 0 from
(2) + (3), which is a
contradiction
o
x
o
x
x: class I (y = 1)
o: class II (y = -1)





<
+
+
≥
−
+
≥
+
−
<
−
−
(4)
(3)
(2)
(1)
0
0
0
0
2
1
2
1
2
1
2
1
w
w
b
w
w
b
w
w
b
w
w
b
– XOR can be solved by a more complex network with
hidden units
Y
z2
z1
x1
x2
2
2
2
2
-2
-2
θ = 1
θ = 0
(-1, -1) (-1, -1) -1
(-1, 1) (-1, 1) 1
(1, -1) (1, -1) 1
(1, 1) (1, 1) -1
Hebb Nets
• Hebb, in his influential book The organization of
Behavior (1949), claimed
– Behavior changes are primarily due to the changes of
synaptic strengths ( ) between neurons I and j
– increases only when both I and j are “on”: the
Hebbian learning law
– In ANN, Hebbian law can be stated: increases
only if the outputs of both units and have the
same sign.
– In our simple network (one output and n input units)
ij
w
ij
w
ij
w
i
x j
y
y
x
old
w
new
w
w i
ij
ij
ij =
−
=
∆ )
(
)
(
y
x
old
w
new
w
w i
ij
ij
ij )
(
)
(
or, α
α
=
−
=
∆
• Hebb net (supervised) learning algorithm (p.49)
Step 0. Initialization: b = 0, wi = 0, i = 1 to n
Step 1. For each of the training sample s:t do steps 2 -4
/* s is the input pattern, t the target output of the sample */
Step 2. xi := si, I = 1 to n /* set s to input units */
Step 3. y := t /* set y to the target */
Step 4. wi := wi + xi * y, i = 1 to n /* update weight */
b := b + xi * y /* update bias */
Notes: 1) α = 1, 2) each training sample is used only once.
• Examples: AND function
– Binary units (1, 0)
(x1, x2, 1) y=t w1 w2 b
(1, 1, 1) 1 1 1 1
(1, 0, 1) 0 1 1 1
(0, 1, 1) 0 1 1 1
(0, 0, 1) 0 1 1 1
An incorrect boundary:
1 + x1 + x2 = 0
Is learned after using
each sample once
bias unit
– Bipolar units (1, -1)
– It will fail to learn x1 ^ x2 ^ x3, even though the function is
linearly separable.
– Stronger learning methods are needed.
• Error driven: for each sample s:t, compute y from s
based on current W and b, then compare y and t
• Use training samples repeatedly, and each time only
change weights slightly (α << 1)
• Learning methods of Perceptron and Adaline are good
examples
(x1, x2, 1) y=t w1 w2 b
(1, 1, 1) 1 1 1 1
(1, -1, 1) -1 0 2 0
(-1, 1, 1) -1 1 1 -1
(-1, -1, 1) -1 2 2 -2
A correct boundary
-1 + x1 + x2 = 0
is successfully learned
Perceptrons
• By Rosenblatt (1962)
– For modeling visual perception (retina)
– Three layers of units: Sensory, Association, and Response
– Learning occurs only on weights from A units to R units
(weights from S units to A units are fixed).
– A single R unit receives inputs from n A units (same
architecture as our simple network)
– For a given training sample s:t, change weights only if the
computed output y is different from the target output t
(thus error driven)
• Perceptron learning algorithm (p.62)
Step 0. Initialization: b = 0, wi = 0, i = 1 to n
Step 1. While stop condition is false do steps 2-5
Step 2. For each of the training sample s:t do steps 3 -5
Step 3. xi := si, i = 1 to n
Step 4. compute y
Step 5. If y != t
wi := wi + α ∗ xi * t, i = 1 to n
b := b + α * t
Notes:
- Learning occurs only when a sample has y != t
- Two loops, a completion of the inner loop (each sample
is used once) is called an epoch
Stop condition
- When no weight is changed in the current epoch, or
- When pre-determined number of epochs is reached
Informal justification: Consider y = 1 and t = -1
– To move y toward t, w1should reduce net_y
– If xi = 1, xi * t < 0, need to reduce w1 (xi*w1 is reduced )
– If xi = -1, xi * t >0 need to increase w1 (xi*w1 is reduced )
See book (pp. 62-68) for an example of execution
• Perceptron learning rule convergence theorem
– Informal: any problem that can be represented by a
perceptron can be learned by the learning rule
– Theorem: If there is a such that for
all P training sample patterns , then for any
start weight vector , the perceptron learning rule will
converge to a weight vector such that
for all p. ( and may not be the
same.)
– Proof: reading for grad students (pp. 77-79
1
W )
(
)
)
(
( 1
p
t
W
p
x
f =
⋅
)}
(
),
(
{ p
t
p
x
0
W
*
W
)
(
)
)
(
( *
p
t
W
p
x
f =
⋅ 1
W
*
W
Adaline
• By Widrow and Hoff (1960)
– Adaptive Linear Neuron for signal processing
– The same architecture of our simple network
– Learning method: delta rule (another way of error driven),
also called Widrow-Hoff learning rule
– The delta: t – y_in
• NOT t – y because y = f( y_in ) is not differentiable
– Learning algorithm: same as Perceptron learning except in
Step 5:
b := b + α ∗
α ∗ (t – y_in)
wi := wi + α ∗
α ∗ xi * (t – y_in)
• Derivation of the delta rule
– Error for all P samples: mean square error
• E is a function of W = {w1, ... wn}
– Learning takes gradient descent approach to reduce E by
modify W
• the gradient of E:
•
•
• There for
∑
=
−
=
P
p
p
in
y
p
t
P
E
1
2
))
(
_
)
(
(
1
)
......
,
(
1 n
w
E
w
E
E
∂
∂
∂
∂
=
∇
i
i
w
E
w
∂
∂
−
∝
∆
i
P
p
P
p i
i
x
p
in
y
p
t
P
p
in
y
p
t
w
p
in
y
p
t
P
w
E
]
))
(
_
)
(
(
2
[
)
(
_
)
(
(
))]
(
_
)
(
(
2
[
1
1
∑
∑
=
=
−
−
=
−
∂
∂
−
=
∂
∂
i
i
w
E
w
∂
∂
−
∝
∆ i
P
x
p
in
y
p
t
P
]
))
(
_
)
(
(
2
[
1
∑ −
=
• How to apply the delta rule
– Method 1 (sequential mode): change wi after each
training pattern by
– Method 2 (batch mode): change wi at the end of each
epoch. Within an epoch, cumulate
for every pattern (x(p), t(p))
– Method 2 is slower but may provide slightly better results
(because Method 1 may be sensitive to the sample
ordering)
• Notes:
– E monotonically decreases until the system reaches a state
with (local) minimum E (a small change of any wi will
cause E to increase).
– At a local minimum E state, , but E is not
guaranteed to be zero
i
x
p
in
y
p
t ))
(
_
)
(
( −
α
α
i
w
E i ∀
=
∂
∂ 0
/
i
x
p
in
y
p
t ))
(
_
)
(
( −
α
α
Summary of these simple networks
• Single layer nets have limited representation power
(linear separability problem)
• Error drive seems a good way to train a net
• Multi-layer nets (or nets with non-linear hidden
units) may overcome linear inseparability problem,
learning methods for such nets are needed
• Threshold/step output functions hinders the effort to
develop learning methods for multi-layered nets
Why hidden units must be non-linear?
• Multi-layer net with linear hidden layers is equivalent to a
single layer net
– Because z1 and z2 are linear unit
z1 = a1* (x1*v11 + x2*v21) + b1
z1 = a2* (x1*v12 + x2*v22) + b2
– y_in = z1*w1 + z2*w2
= x1*u1 + x2*u2 + b1+b2 where
u1 = (a1*v11+ a2*v12)w1, u2 = (a1*v21 + a2*v22)*w2
y_in is still a linear combination of x1 and x2.
Y
z2
z1
x1
x2
w1
w2
v11
v22
v12
v21
θ = 0

More Related Content

Similar to NN-Ch2.PDF

5_2019_02_01!09_42_56_PM.pptx
5_2019_02_01!09_42_56_PM.pptx5_2019_02_01!09_42_56_PM.pptx
5_2019_02_01!09_42_56_PM.pptxShalabhMishra10
 
Finite Element Method.ppt
Finite Element Method.pptFinite Element Method.ppt
Finite Element Method.pptwerom2
 
Introduction to Neural Networks and Deep Learning from Scratch
Introduction to Neural Networks and Deep Learning from ScratchIntroduction to Neural Networks and Deep Learning from Scratch
Introduction to Neural Networks and Deep Learning from ScratchAhmed BESBES
 
Spm add-maths-formula-list-form4-091022090639-phpapp01
Spm add-maths-formula-list-form4-091022090639-phpapp01Spm add-maths-formula-list-form4-091022090639-phpapp01
Spm add-maths-formula-list-form4-091022090639-phpapp01Asad Bukhari
 
Spm add-maths-formula-list-form4-091022090639-phpapp01
Spm add-maths-formula-list-form4-091022090639-phpapp01Spm add-maths-formula-list-form4-091022090639-phpapp01
Spm add-maths-formula-list-form4-091022090639-phpapp01Nur Kamila
 
Additional Mathematics form 4 (formula)
Additional Mathematics form 4 (formula)Additional Mathematics form 4 (formula)
Additional Mathematics form 4 (formula)Fatini Adnan
 
Shape drawing algs
Shape drawing algsShape drawing algs
Shape drawing algsMusawarNice
 
6.3_DiscriminantFunctions for machine learning supervised learning
6.3_DiscriminantFunctions for machine learning supervised learning6.3_DiscriminantFunctions for machine learning supervised learning
6.3_DiscriminantFunctions for machine learning supervised learningMrsMargaretSavithaP
 
cs621-lect18-feedforward-network-contd-2009-9-24.ppt
cs621-lect18-feedforward-network-contd-2009-9-24.pptcs621-lect18-feedforward-network-contd-2009-9-24.ppt
cs621-lect18-feedforward-network-contd-2009-9-24.pptGayathriRHICETCSESTA
 
cs621-lect18-feedforward-network-contd-2009-9-24.ppt
cs621-lect18-feedforward-network-contd-2009-9-24.pptcs621-lect18-feedforward-network-contd-2009-9-24.ppt
cs621-lect18-feedforward-network-contd-2009-9-24.pptGayathriRHICETCSESTA
 
Output Primitive and Brenshamas Line.pptx
Output Primitive and Brenshamas Line.pptxOutput Primitive and Brenshamas Line.pptx
Output Primitive and Brenshamas Line.pptxNaveenaKarthik3
 
Graphs of the Sine and Cosine Functions Lecture
Graphs of the Sine and Cosine Functions LectureGraphs of the Sine and Cosine Functions Lecture
Graphs of the Sine and Cosine Functions LectureFroyd Wess
 
Artificial neural netwoks2
Artificial neural netwoks2Artificial neural netwoks2
Artificial neural netwoks2yosser atassi
 

Similar to NN-Ch2.PDF (20)

5_2019_02_01!09_42_56_PM.pptx
5_2019_02_01!09_42_56_PM.pptx5_2019_02_01!09_42_56_PM.pptx
5_2019_02_01!09_42_56_PM.pptx
 
Finite Element Method.ppt
Finite Element Method.pptFinite Element Method.ppt
Finite Element Method.ppt
 
Introduction to Neural Networks and Deep Learning from Scratch
Introduction to Neural Networks and Deep Learning from ScratchIntroduction to Neural Networks and Deep Learning from Scratch
Introduction to Neural Networks and Deep Learning from Scratch
 
Spm add-maths-formula-list-form4-091022090639-phpapp01
Spm add-maths-formula-list-form4-091022090639-phpapp01Spm add-maths-formula-list-form4-091022090639-phpapp01
Spm add-maths-formula-list-form4-091022090639-phpapp01
 
Spm add-maths-formula-list-form4-091022090639-phpapp01
Spm add-maths-formula-list-form4-091022090639-phpapp01Spm add-maths-formula-list-form4-091022090639-phpapp01
Spm add-maths-formula-list-form4-091022090639-phpapp01
 
Additional Mathematics form 4 (formula)
Additional Mathematics form 4 (formula)Additional Mathematics form 4 (formula)
Additional Mathematics form 4 (formula)
 
NUMERICAL METHODS
NUMERICAL METHODSNUMERICAL METHODS
NUMERICAL METHODS
 
Artificial Neural Networks
Artificial Neural NetworksArtificial Neural Networks
Artificial Neural Networks
 
Shape drawing algs
Shape drawing algsShape drawing algs
Shape drawing algs
 
eq mothion.pptx
eq mothion.pptxeq mothion.pptx
eq mothion.pptx
 
6.3_DiscriminantFunctions for machine learning supervised learning
6.3_DiscriminantFunctions for machine learning supervised learning6.3_DiscriminantFunctions for machine learning supervised learning
6.3_DiscriminantFunctions for machine learning supervised learning
 
feedforward-network-
feedforward-network-feedforward-network-
feedforward-network-
 
cs621-lect18-feedforward-network-contd-2009-9-24.ppt
cs621-lect18-feedforward-network-contd-2009-9-24.pptcs621-lect18-feedforward-network-contd-2009-9-24.ppt
cs621-lect18-feedforward-network-contd-2009-9-24.ppt
 
cs621-lect18-feedforward-network-contd-2009-9-24.ppt
cs621-lect18-feedforward-network-contd-2009-9-24.pptcs621-lect18-feedforward-network-contd-2009-9-24.ppt
cs621-lect18-feedforward-network-contd-2009-9-24.ppt
 
Output Primitive and Brenshamas Line.pptx
Output Primitive and Brenshamas Line.pptxOutput Primitive and Brenshamas Line.pptx
Output Primitive and Brenshamas Line.pptx
 
support vector machine
support vector machinesupport vector machine
support vector machine
 
03 Single layer Perception Classifier
03 Single layer Perception Classifier03 Single layer Perception Classifier
03 Single layer Perception Classifier
 
Graphs of the Sine and Cosine Functions Lecture
Graphs of the Sine and Cosine Functions LectureGraphs of the Sine and Cosine Functions Lecture
Graphs of the Sine and Cosine Functions Lecture
 
Artificial neural netwoks2
Artificial neural netwoks2Artificial neural netwoks2
Artificial neural netwoks2
 
stoch41.pdf
stoch41.pdfstoch41.pdf
stoch41.pdf
 

More from gnans Kgnanshek

MICROCONTROLLER EMBEDDED SYSTEM IOT 8051 TIMER.pptx
MICROCONTROLLER EMBEDDED SYSTEM IOT 8051 TIMER.pptxMICROCONTROLLER EMBEDDED SYSTEM IOT 8051 TIMER.pptx
MICROCONTROLLER EMBEDDED SYSTEM IOT 8051 TIMER.pptxgnans Kgnanshek
 
EDC SLIDES PRESENTATION PRINCIPAL ON WEDNESDAY.pptx
EDC SLIDES PRESENTATION PRINCIPAL ON WEDNESDAY.pptxEDC SLIDES PRESENTATION PRINCIPAL ON WEDNESDAY.pptx
EDC SLIDES PRESENTATION PRINCIPAL ON WEDNESDAY.pptxgnans Kgnanshek
 
ACUMENS ON NEURAL NET AKG 20 7 23.pptx
ACUMENS ON NEURAL NET AKG 20 7 23.pptxACUMENS ON NEURAL NET AKG 20 7 23.pptx
ACUMENS ON NEURAL NET AKG 20 7 23.pptxgnans Kgnanshek
 
Lecture_3_Gradient_Descent.pptx
Lecture_3_Gradient_Descent.pptxLecture_3_Gradient_Descent.pptx
Lecture_3_Gradient_Descent.pptxgnans Kgnanshek
 
Batch_Normalization.pptx
Batch_Normalization.pptxBatch_Normalization.pptx
Batch_Normalization.pptxgnans Kgnanshek
 
33.-Multi-Layer-Perceptron.pdf
33.-Multi-Layer-Perceptron.pdf33.-Multi-Layer-Perceptron.pdf
33.-Multi-Layer-Perceptron.pdfgnans Kgnanshek
 
unit-1 MANAGEMENT AND ORGANIZATIONS.pptx
unit-1 MANAGEMENT AND ORGANIZATIONS.pptxunit-1 MANAGEMENT AND ORGANIZATIONS.pptx
unit-1 MANAGEMENT AND ORGANIZATIONS.pptxgnans Kgnanshek
 
3 AGRI WORK final review slides.pptx
3 AGRI WORK final review slides.pptx3 AGRI WORK final review slides.pptx
3 AGRI WORK final review slides.pptxgnans Kgnanshek
 

More from gnans Kgnanshek (19)

MICROCONTROLLER EMBEDDED SYSTEM IOT 8051 TIMER.pptx
MICROCONTROLLER EMBEDDED SYSTEM IOT 8051 TIMER.pptxMICROCONTROLLER EMBEDDED SYSTEM IOT 8051 TIMER.pptx
MICROCONTROLLER EMBEDDED SYSTEM IOT 8051 TIMER.pptx
 
EDC SLIDES PRESENTATION PRINCIPAL ON WEDNESDAY.pptx
EDC SLIDES PRESENTATION PRINCIPAL ON WEDNESDAY.pptxEDC SLIDES PRESENTATION PRINCIPAL ON WEDNESDAY.pptx
EDC SLIDES PRESENTATION PRINCIPAL ON WEDNESDAY.pptx
 
types of research.pptx
types of research.pptxtypes of research.pptx
types of research.pptx
 
CS8601 4 MC NOTES.pdf
CS8601 4 MC NOTES.pdfCS8601 4 MC NOTES.pdf
CS8601 4 MC NOTES.pdf
 
CS8601 3 MC NOTES.pdf
CS8601 3 MC NOTES.pdfCS8601 3 MC NOTES.pdf
CS8601 3 MC NOTES.pdf
 
CS8601 2 MC NOTES.pdf
CS8601 2 MC NOTES.pdfCS8601 2 MC NOTES.pdf
CS8601 2 MC NOTES.pdf
 
CS8601 1 MC NOTES.pdf
CS8601 1 MC NOTES.pdfCS8601 1 MC NOTES.pdf
CS8601 1 MC NOTES.pdf
 
ACUMENS ON NEURAL NET AKG 20 7 23.pptx
ACUMENS ON NEURAL NET AKG 20 7 23.pptxACUMENS ON NEURAL NET AKG 20 7 23.pptx
ACUMENS ON NEURAL NET AKG 20 7 23.pptx
 
19_Learning.ppt
19_Learning.ppt19_Learning.ppt
19_Learning.ppt
 
Lecture_3_Gradient_Descent.pptx
Lecture_3_Gradient_Descent.pptxLecture_3_Gradient_Descent.pptx
Lecture_3_Gradient_Descent.pptx
 
Batch_Normalization.pptx
Batch_Normalization.pptxBatch_Normalization.pptx
Batch_Normalization.pptx
 
33.-Multi-Layer-Perceptron.pdf
33.-Multi-Layer-Perceptron.pdf33.-Multi-Layer-Perceptron.pdf
33.-Multi-Layer-Perceptron.pdf
 
11_NeuralNets.pdf
11_NeuralNets.pdf11_NeuralNets.pdf
11_NeuralNets.pdf
 
NN-Ch7.PDF
NN-Ch7.PDFNN-Ch7.PDF
NN-Ch7.PDF
 
NN-Ch6.PDF
NN-Ch6.PDFNN-Ch6.PDF
NN-Ch6.PDF
 
NN-Ch5.PDF
NN-Ch5.PDFNN-Ch5.PDF
NN-Ch5.PDF
 
unit-1 MANAGEMENT AND ORGANIZATIONS.pptx
unit-1 MANAGEMENT AND ORGANIZATIONS.pptxunit-1 MANAGEMENT AND ORGANIZATIONS.pptx
unit-1 MANAGEMENT AND ORGANIZATIONS.pptx
 
POM all 5 Units.pptx
POM all 5 Units.pptxPOM all 5 Units.pptx
POM all 5 Units.pptx
 
3 AGRI WORK final review slides.pptx
3 AGRI WORK final review slides.pptx3 AGRI WORK final review slides.pptx
3 AGRI WORK final review slides.pptx
 

Recently uploaded

KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...M56BOOKSTORE PRODUCT/SERVICE
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Celine George
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppCeline George
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
MENTAL STATUS EXAMINATION format.docx
MENTAL     STATUS EXAMINATION format.docxMENTAL     STATUS EXAMINATION format.docx
MENTAL STATUS EXAMINATION format.docxPoojaSen20
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Educationpboyjonauth
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxRoyAbrique
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application ) Sakshi Ghasle
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTiammrhaywood
 

Recently uploaded (20)

KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website App
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
MENTAL STATUS EXAMINATION format.docx
MENTAL     STATUS EXAMINATION format.docxMENTAL     STATUS EXAMINATION format.docx
MENTAL STATUS EXAMINATION format.docx
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Education
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application )
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
 

NN-Ch2.PDF

  • 1. Introduction to Neural Networks Computing CMSC491N/691N, Spring 2001
  • 2. Notations units: activation/output: if is an input unit, for other units , where f( .) is the activation function for weights: from unit i to unit j (other books use ) j i y x , j i Y X , i X signal input = i x j Y ) _ ( j j in y f y = j Y ji w ij w i X j Y ij w
  • 3. bias: ( a constant input) threshold: (for units with step/threshold activation function) weight matrix: W={ } i: row index; j: column index j b j θ θ ij w
  • 4. 0 5 2 ( ) row vectors 3 0 4 ( ) 1 6 -1 ( ) column vectors vectors of weights: weights come into unit j weights go out of unit i ) ( 3 , 2 , 1 j j j j w w w w L = • ) ( 3 , 2 , 1 i i i i w w w w L = • • 1 w • 2 w • 3 w 1 • w 2 • w 3 • w 1 2 3 2 1 5 3 4 6 -1
  • 6. Review of Matrix Operations Vector: a sequence of elements (the order is important) e.g., x=(2, 1) denotes a vector length = sqrt(2*2+1*1) orientation angle = a x=(x1, x2, ……, xn), an n dimensional vector a point on an n dimensional space column vector: row vector               = 8 5 2 1 x a T x y = = ) 8 5 2 1 ( x x T T = ) ( X (2, 1) transpose
  • 7. norms of a vector: (magnitude) vector operations: i n i i n i i n i x x norm L x x norm L x x norm L max ) ( 1 2 / 1 2 1 2 2 1 1 1 ≤ ≤ ∞ ∞ = = = ∑ = ∑ = x y x x x y y y y x y y y x x x y x n y x x r rx rx rx rx T n n i i n i n n T T n • =             = =             = • = Σ = M M ) ... , ( ) ...... , ( dimension same of vectors column are , product ) dot ( inner vector column a : , scaler a : ) ,...... , ( 2 1 2 1 1 2 1 2 1 2 1
  • 8. Cross product: defines another vector orthogonal to the plan formed by x and y. y x ×
  • 9. the element on the ith row and jth column a diagonal element a weight in a weight matrix W each row or column is a vector jth column vector ith row vector n m j i mn m m n n m a a a a a a a A × × =           = } { ...... ...... 2 1 1 12 11 M         = = • • • • • • m n n m i j a a a a A a a M ) ...... ( : : 1 1 x : : : ij ii ij w a a
  • 10. a column vector of dimension m is a matrix of mx1 transpose: jth column becomes jth row square matrix: identity matrix:           = × mn n n m T n m a a a a a a A ...... ...... 2 1 1 21 11 n n A ×    = =           = otherwise j i if a I j i 0 1 1 ...... 0 0 0 ...... 1 0 0 ..... 0 1
  • 11. symmetric matrix: m = n matrix operations: The result is a row vector, each element of which is an inner product of and a column vector ji ij i i T a a ij or a a i or A A = ∀ = ∀ = • • , , ) ( ) ,...... ( 1 j i n ra ra ra rA = = • • ) ,...... ( ) ,...... )( ...... ( 1 1 1 n T T n m n m T a x a x a a x x A x • • • • × = = T x j a •
  • 12. product of two matrices: vector outer product: j i ij p m p n n m b a C where C B A • • × × × • = = × n m n n n m A I A × × × = × ( )               =               = ⋅ n m m m n n m i T y x y x y x y x y x y x y y x x x y x ...... , , ,...... , ...... 2 1 1 2 1 1 1 1 1 M M M
  • 13. Calculus and Differential Equations • , the derivative of , with respect to time • System of differential equations solution: difficult to solve unless are simple (t) i x &      = = ) ( ) ( ) ( ) ( 1 1 t f t x t f t x n n & M & i x t )) ( ), ( ( 1 t x t x n L ) ( t f i
  • 14. • Multi-variable calculus: partial derivative: gives the direction and speed of change of y, with respect to )) ( ...... ), ( ), ( ( ) ( 2 1 t x t x t x f t y n = ) ( 3 ) ( 2 2 ) ( 1 1 ) ( 2 2 1 3 2 1 3 2 1 3 2 1 3 2 1 2 ) cos( ) sin( x x x x x x x x x x x x e x y e x x y e x x y e x x y + + − + + − + + − + + − − = ∂ ∂ − = ∂ ∂ − = ∂ ∂ + + = i x
  • 15. the total derivative: Gradient of f : Chain-rule: y is a function of , is a function of t T n t x t x t x t x t y n f n x f x f dt df ) ...... 1 ( ...... 1 ) ( ) ( ) ( ) ( ) ( 1 & & & & & • ∇ = ∂ ∂ + ∂ ∂ = = ) ...... , ( 1 n x f x f f ∂ ∂ ∂ ∂ = ∇ )) ( ...... ), ( ), ( ( ) ( 2 1 t x t x t x f t y n = i x i x
  • 16. dynamic system: – change of may potentially affect other x – all continue to change (the system evolves) – reaches equilibrium when – stability/attraction: special equilibrium point (minimal energy state) – pattern of at a stable state often represents a solution          = = ) ...... , ( ) ..... , ( 1 1 1 1 ) ( ) ( n n n x x f n x x f t x t x & & M i x i x i ∀ = 0 & ) ...... , ( 1 n x x i x
  • 17. Chapter 2: Simple Neural Networks for Pattern Classification • General discussion • Linear separability • Hebb nets • Perceptron • Adaline
  • 18. General discussion • Pattern recognition – Patterns: images, personal records, driving habits, etc. – Represented as a vector of features (encoded as integers or real numbers in NN) – Pattern classification: • Classify a pattern to one of the given classes • Form pattern classes – Pattern associative recall • Using a pattern to recall a related pattern • Pattern completion: using a partial pattern to recall the whole pattern • Pattern recovery: deals with noise, distortion, missing information
  • 19. • General architecture Single layer net input to Y: bias b is treated as the weight from a special unit with constant output 1. threshold related to Y output classify into one of the two classes ∑ = + = n i i i w x b net 1    < ≥ = = θ θ θ θ net net net f y if 1 - if 1 ) ( θ θ Y xn x1 1 w n w 1 b ) ...... , ( 1 n x x
  • 20. • Decision region/boundary n = 2, b != 0, θ = 0 is a line, called decision boundary, which partitions the plane into two decision regions If a point/pattern is in the positive region, then , and the output is one (belongs to class one) Otherwise, , output –1 (belongs to class two) n = 2, b = 0, θ != 0 would result a similar partition 2 1 2 1 2 2 2 1 1 or 0 w b x w w x w x w x b − − = = + + 2 x 1 x + - ) , ( 2 1 x x 0 2 2 1 1 ≥ + + w x w x b 0 2 2 1 1 < + + w x w x b
  • 21. – If n = 3 (three input units), then the decision boundary is a two dimensional plane in a three dimensional space – In general, a decision boundary is a n-1 dimensional hyper-plane in an n dimensional space, which partition the space into two decision regions – This simple network thus can classify a given pattern into one of the two classes, provided one of these two classes is entirely in one decision region (one side of the decision boundary) and the other class is in another region. – The decision boundary is determined completely by the weights W and the bias b (or threshold θ θ). 0 1 = + ∑ = n i i i w x b
  • 22. Linear Separability Problem • If two classes of patterns can be separated by a decision boundary, represented by the linear equation then they are said to be linearly separable. The simple network can correctly classify any patterns. • Decision boundary (i.e., W, b or θ θ) of linearly separable classes can be determined either by some learning procedures or by solving linear equation systems based on representative patterns of each classes • If such a decision boundary does not exist, then the two classes are said to be linearly inseparable. • Linearly inseparable problems cannot be solved by the simple network , more sophisticated architecture is needed. 0 1 = + ∑ = n i i i w x b
  • 23. • Examples of linearly separable classes - Logical AND function patterns (bipolar) decision boundary x1 x2 y w1 = 1 -1 -1 -1 w2 = 1 -1 1 -1 b = -1 1 -1 -1 θ = 0 1 1 1 -1 + x1 + x2 = 0 - Logical OR function patterns (bipolar) decision boundary x1 x2 y w1 = 1 -1 -1 -1 w2 = 1 -1 1 1 b = 1 1 -1 1 θ = 0 1 1 1 1 + x1 + x2 = 0 x o o o x: class I (y = 1) o: class II (y = -1) x x o x x: class I (y = 1) o: class II (y = -1)
  • 24. • Examples of linearly inseparable classes - Logical XOR (exclusive OR) function patterns (bipolar) decision boundary x1 x2 y -1 -1 -1 -1 1 1 1 -1 1 1 1 -1 No line can separate these two classes, as can be seen from the fact that the following linear inequality system has no solution because we have b < 0 from (1) + (4), and b >= 0 from (2) + (3), which is a contradiction o x o x x: class I (y = 1) o: class II (y = -1)      < + + ≥ − + ≥ + − < − − (4) (3) (2) (1) 0 0 0 0 2 1 2 1 2 1 2 1 w w b w w b w w b w w b
  • 25. – XOR can be solved by a more complex network with hidden units Y z2 z1 x1 x2 2 2 2 2 -2 -2 θ = 1 θ = 0 (-1, -1) (-1, -1) -1 (-1, 1) (-1, 1) 1 (1, -1) (1, -1) 1 (1, 1) (1, 1) -1
  • 26. Hebb Nets • Hebb, in his influential book The organization of Behavior (1949), claimed – Behavior changes are primarily due to the changes of synaptic strengths ( ) between neurons I and j – increases only when both I and j are “on”: the Hebbian learning law – In ANN, Hebbian law can be stated: increases only if the outputs of both units and have the same sign. – In our simple network (one output and n input units) ij w ij w ij w i x j y y x old w new w w i ij ij ij = − = ∆ ) ( ) ( y x old w new w w i ij ij ij ) ( ) ( or, α α = − = ∆
  • 27. • Hebb net (supervised) learning algorithm (p.49) Step 0. Initialization: b = 0, wi = 0, i = 1 to n Step 1. For each of the training sample s:t do steps 2 -4 /* s is the input pattern, t the target output of the sample */ Step 2. xi := si, I = 1 to n /* set s to input units */ Step 3. y := t /* set y to the target */ Step 4. wi := wi + xi * y, i = 1 to n /* update weight */ b := b + xi * y /* update bias */ Notes: 1) α = 1, 2) each training sample is used only once. • Examples: AND function – Binary units (1, 0) (x1, x2, 1) y=t w1 w2 b (1, 1, 1) 1 1 1 1 (1, 0, 1) 0 1 1 1 (0, 1, 1) 0 1 1 1 (0, 0, 1) 0 1 1 1 An incorrect boundary: 1 + x1 + x2 = 0 Is learned after using each sample once bias unit
  • 28. – Bipolar units (1, -1) – It will fail to learn x1 ^ x2 ^ x3, even though the function is linearly separable. – Stronger learning methods are needed. • Error driven: for each sample s:t, compute y from s based on current W and b, then compare y and t • Use training samples repeatedly, and each time only change weights slightly (α << 1) • Learning methods of Perceptron and Adaline are good examples (x1, x2, 1) y=t w1 w2 b (1, 1, 1) 1 1 1 1 (1, -1, 1) -1 0 2 0 (-1, 1, 1) -1 1 1 -1 (-1, -1, 1) -1 2 2 -2 A correct boundary -1 + x1 + x2 = 0 is successfully learned
  • 29. Perceptrons • By Rosenblatt (1962) – For modeling visual perception (retina) – Three layers of units: Sensory, Association, and Response – Learning occurs only on weights from A units to R units (weights from S units to A units are fixed). – A single R unit receives inputs from n A units (same architecture as our simple network) – For a given training sample s:t, change weights only if the computed output y is different from the target output t (thus error driven)
  • 30. • Perceptron learning algorithm (p.62) Step 0. Initialization: b = 0, wi = 0, i = 1 to n Step 1. While stop condition is false do steps 2-5 Step 2. For each of the training sample s:t do steps 3 -5 Step 3. xi := si, i = 1 to n Step 4. compute y Step 5. If y != t wi := wi + α ∗ xi * t, i = 1 to n b := b + α * t Notes: - Learning occurs only when a sample has y != t - Two loops, a completion of the inner loop (each sample is used once) is called an epoch Stop condition - When no weight is changed in the current epoch, or - When pre-determined number of epochs is reached
  • 31. Informal justification: Consider y = 1 and t = -1 – To move y toward t, w1should reduce net_y – If xi = 1, xi * t < 0, need to reduce w1 (xi*w1 is reduced ) – If xi = -1, xi * t >0 need to increase w1 (xi*w1 is reduced ) See book (pp. 62-68) for an example of execution • Perceptron learning rule convergence theorem – Informal: any problem that can be represented by a perceptron can be learned by the learning rule – Theorem: If there is a such that for all P training sample patterns , then for any start weight vector , the perceptron learning rule will converge to a weight vector such that for all p. ( and may not be the same.) – Proof: reading for grad students (pp. 77-79 1 W ) ( ) ) ( ( 1 p t W p x f = ⋅ )} ( ), ( { p t p x 0 W * W ) ( ) ) ( ( * p t W p x f = ⋅ 1 W * W
  • 32. Adaline • By Widrow and Hoff (1960) – Adaptive Linear Neuron for signal processing – The same architecture of our simple network – Learning method: delta rule (another way of error driven), also called Widrow-Hoff learning rule – The delta: t – y_in • NOT t – y because y = f( y_in ) is not differentiable – Learning algorithm: same as Perceptron learning except in Step 5: b := b + α ∗ α ∗ (t – y_in) wi := wi + α ∗ α ∗ xi * (t – y_in)
  • 33. • Derivation of the delta rule – Error for all P samples: mean square error • E is a function of W = {w1, ... wn} – Learning takes gradient descent approach to reduce E by modify W • the gradient of E: • • • There for ∑ = − = P p p in y p t P E 1 2 )) ( _ ) ( ( 1 ) ...... , ( 1 n w E w E E ∂ ∂ ∂ ∂ = ∇ i i w E w ∂ ∂ − ∝ ∆ i P p P p i i x p in y p t P p in y p t w p in y p t P w E ] )) ( _ ) ( ( 2 [ ) ( _ ) ( ( ))] ( _ ) ( ( 2 [ 1 1 ∑ ∑ = = − − = − ∂ ∂ − = ∂ ∂ i i w E w ∂ ∂ − ∝ ∆ i P x p in y p t P ] )) ( _ ) ( ( 2 [ 1 ∑ − =
  • 34. • How to apply the delta rule – Method 1 (sequential mode): change wi after each training pattern by – Method 2 (batch mode): change wi at the end of each epoch. Within an epoch, cumulate for every pattern (x(p), t(p)) – Method 2 is slower but may provide slightly better results (because Method 1 may be sensitive to the sample ordering) • Notes: – E monotonically decreases until the system reaches a state with (local) minimum E (a small change of any wi will cause E to increase). – At a local minimum E state, , but E is not guaranteed to be zero i x p in y p t )) ( _ ) ( ( − α α i w E i ∀ = ∂ ∂ 0 / i x p in y p t )) ( _ ) ( ( − α α
  • 35. Summary of these simple networks • Single layer nets have limited representation power (linear separability problem) • Error drive seems a good way to train a net • Multi-layer nets (or nets with non-linear hidden units) may overcome linear inseparability problem, learning methods for such nets are needed • Threshold/step output functions hinders the effort to develop learning methods for multi-layered nets
  • 36. Why hidden units must be non-linear? • Multi-layer net with linear hidden layers is equivalent to a single layer net – Because z1 and z2 are linear unit z1 = a1* (x1*v11 + x2*v21) + b1 z1 = a2* (x1*v12 + x2*v22) + b2 – y_in = z1*w1 + z2*w2 = x1*u1 + x2*u2 + b1+b2 where u1 = (a1*v11+ a2*v12)w1, u2 = (a1*v21 + a2*v22)*w2 y_in is still a linear combination of x1 and x2. Y z2 z1 x1 x2 w1 w2 v11 v22 v12 v21 θ = 0