Artificial neural networks - A gentle introduction to ANNS.pptx

PATTERN RECOGNITION
A GENTLE INTRODUCTION TO
ARTIFICIAL NEURAL NETWORKS
(PART I)

WHAT IS IT?
 An artificial neural network is a crude way of trying
to simulate the human brain (digitally)
 Human brain – Approx 10 billion neurons
 Each neuron connected with thousands of others
 Parts of neuron
 Cell body
 Dendrites – receive input signal
 Axons – Give output

INTRODUCTION
 ANN – made up of artificial neurons
 Digitally modeled biological neuron
 Each input into the neuron has its own weight
associated with it
 As each input enters the nucleus (blue circle) it's
multiplied by its weight.

INTRODUCTION
 The nucleus sums all these new input values which
gives us the activation
 For n inputs and n weights – weights multiplied by
input and summed
a = x1w1+x2w2+x3w3... +xnwn

INTRODUCTION
 If the activation is greater than a threshold value - the
neuron outputs a signal – (for example 1)
 If the activation is less than threshold the neuron outputs
zero.
 This is typically called a step function

INTRODUCTION
 The combination of summation and thresholding is
called a node
 For step (activation) function – The output is 1 if:
http://www-cse.uta.edu/~cook/ai1/lectures/figures/neuron.jpg
x1w1+x2w2+x3w3... +xnwn > T

INTRODUCTION
x1w1+x2w2+x3w3... +xnwn > T
x1w1+x2w2+x3w3... +xnwn -T > 0
Let w0 = -T and x0 = 1
D = x0w0 + x1w1+x2w2+x3w3... +xnwn > 0
Output is 1 if D> 0;
Output is 0 otherwise
w0 is called a bias weight

TYPICAL ACTIVATION FUNCTIONS
Step function Sign function
+1
-1
0
+1
-1
0
X
Y
X
Y
1 1
-1
0 X
Y
Sigmoidfunction
-1
0 X
Y
Linear function






0
if
,
0
0
if
,
1
X
X
Ystep








0
if
,
1
0
if
,
1
X
X
Ysign
X
sigmoid
e
Y



1
1
X
Ylinear

Controls when unit is “active” or “inactive”

AN ARTIFICIAL NEURON- SUMMARY SO FAR
 Receives n-inputs
 Multiplies each input by
its weight
 Applies activation
function to the sum of
results
 Outputs result
http://www-cse.uta.edu/~cook/ai1/lectures/figures/neuron.jpg

SIMPLEST CLASSIFIER
Can a single neuron learn a task?

A MOTIVATING EXAMPLE
 Each day you get lunch at the cafeteria.
 Your diet consists of fish, chips, and drink.
 You get several portions of each
 The cashier only tells you the total price of the meal
 After several days, you should be able to figure out the price of each
portion.
 Each meal price gives a linear constraint on the prices of the
portions:
drink
drink
chips
chips
fish
fish w
x
w
x
w
x
price 



SOLVING THE PROBLEM
 The prices of the portions are like the weights in of a linear
neuron.
 We will start with guesses for the weights and then adjust the
guesses to give a better fit to the prices given by the cashier.
)
( ,
, drink
chips
fish w
w
w

w

THE CASHIER’S BRAIN
Price of meal = 850
portions of fish portions of
chips
portions of
drink
150 50 100
2 5 3
Linear
neuron

 Residual error = 350
 Apply learning rules and
update weights
A MODEL OF THE CASHIER’S BRAIN
WITH ARBITRARY INITIAL WEIGHTS
Price of meal = 500
portions of fish portions of
chips
portions of
drink
50 50 50
2 5 3

PERCEPTRON
 In 1958, Frank Rosenblatt introduced a training
algorithm that provided the first procedure for
training a simple ANN: a perceptron.
Threshold
Inputs
x1
x2
Output
Y
Hard
Limiter
w2
w1
Linear
Combiner

A two input perceptron

PERCEPTRON
 A perceptron takes several inputs, x1, x2, ……, and
produces a single binary output.
 The model consists of a linear combiner followed by a hard
limiter.
 The weighted sum of the inputs is applied to the hard limiter,
which produces an output equal to +1 if its input is positive
and -1 if it is negative. (1/0 in some models).
𝑦 = 𝑠𝑔𝑛
𝑖=1
2
𝑤𝑖𝑥𝑖 + 𝜃
𝑠𝑔𝑛 𝑠 =
1 𝑖𝑓 𝑠 > 0
−1 𝑂𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒

PERCEPTRON
1
x1 Y
-10
1
4
This is Equation of a line - Decision boundary
−10 + 1𝑥1 + 4𝑥2 = 0
𝑥1 = −4𝑥2 + 10
x2

PERCEPTRON
This is Equation of a line - Decision boundary
𝑥1 = −4𝑥2 + 10

PERCEPTRON LEARNING
 A perceptron (threshold unit) can learn anything
that it can represent (i.e. anything separable with a
hyperplane)
X1 X2 Y
0 0 0
0 1 1
1 0 1
1 1 1

21
OR FUNCTION
 The two-input perceptron can implement the OR function when
we set the weights: w0 = -0.3, w1 = w2 = 0.5
Decision hyperplane :
w0 + w1 x1 + w2 x2 = 0
-0.3 + 0.5 x1 + 0.5 x2 = 0
-
+
+
+
x1
x2
-0.3 + 0.5 x1 + 0.5 x2 = 0
-
+
+
+
x1
x2
-0.3 + 0.5 x1 + 0.5 x2 = 0
X1 X2 Y
0 0
0 1
1 0
1 1
Training Data
-1
+1
+1
+1

22
OR FUNCTION
w0 + w1 x1 + w2 x2 = 0
-0.3 + 0.5 x1 + 0.5 x2 = 0
-
+
+
+
x1
x2
-0.3 + 0.5 x1 + 0.5 x2 = 0
-
+
+
+
x1
x2
-0.3 + 0.5 x1 + 0.5 x2 = 0
Test Results
X1 X2 𝑾𝒊𝒙𝒊
Y
0 0 -0.3 -1
0 1 0.2 +1
1 0 0.2 +1
1 1 0.7 +1

23
A SINGLE PERCEPTRON CAN BE USED TO REPRESENT MANY
BOOLEAN FUNCTIONS.
 AND FUNCTION :
w0 + w1 x1 + w2 x2 = 0
-0.8 + 0.5 x1 + 0.5 x2 = 0
-
-
-
+
x1
x2
-0.8 + 0.5 x1 + 0.5 x2 = 0
-
-
-
+
x1
x2
-0.8 + 0.5 x1 + 0.5 x2 = 0
X1 X2 Y
0 0 -1
0 1 -1
1 0 -1
1 1 +1
X1 X2
𝑾𝒊𝒙𝒊
Y
0 0 -0.8 -1
0 1 -0.3 -1
1 0 -0.3 -1
1 1 0.2 +1
Training Examples
Test Results

XOR FUNCTION
 A Perceptron cannot represent Exclusive OR since
it is not linearly separable.
X1 X2 Y
0 0 -1
0 1 +1
1 0 +1
1 1 -1
XOR Function

25
XOR FUNCTION :
It is impossible to implement the XOR function by a
single perceptron
Two perceptrons?
X1 X2 Y
0 0 -1
0 1 +1
1 0 +1
1 1 -1
XOR Function

2D PLOT OF BASIC LOGICAL OPERATORS
x1
x2
1
1
x1
x2
1
1
(b) OR (x1  x2)
x1
x2
1
1
(c) Exclusive-OR
(x1  x2)
0
0 0
(a) AND (x1  x2)
A perceptron can learn the operations
AND and OR, but not Exclusive-OR.

PERCEPTRON
 The aim of the perceptron is to classify inputs, x1,
x2, . . ., xn, into one of two classes, say A1 and A2.
 In the case of an elementary perceptron, the n-
dimensional space is divided by a hyperplane into
two decision regions. The hyperplane is defined by
the function:
0
1





n
i
i
i w
x

LINEAR SEPARABILITY WITH PERCEPTRON
x1
x2
Class A2
Class A1
1
2
x1w1 + x2w2  = 0
(a) Two-input perceptron. (b) Three-
input perceptron.
x2
x1
x3
x1w1 + x2w2 + x3w3  = 0
1
2

GRADIENT DESCENT
 Error Surface
 Use gradient descente to find the minimum value of
E

TRAINING RULE DERIVATION – GRADIENT
DESCENT
 Objective: Find the values of weights which minimize
the error function
O(d) is the observed and T(d) is the target output for training example ‘d’
d
n
n
d
d
d
m
d
d
d
x
w
x
w
x
w
w
O
O
T
E






 

....
)
(
2
1
2
2
1
1
0
)
(
2
1
)
(
)
(

BATCH GRADIENT DESCENTE
Gradient-Descent(training_examples, )
Each training example is a pair of the form <(x1,…xn),t> where (x1,…,xn) is the vector
of input values, and t is the target output value,  is the learning rate (e.g. 0.1)
Initialize each wi to some small random value
Until the termination condition is met
Do
Initialize each wi to zero
For each <(x1,…xn),t> in training_examples
Do
Input the instance (x1,…,xn) to the linear unit and compute the output o
For each linear unit weight wi
Do
wi= wi +  (t-o) xi
Do
wi=wi+wi

wi  (td
d D
 od )xid

INCREMENTAL GRADIENT DESCENTE
 The gradient decent training rule updates summing over
all the training examples
 Stochastic gradient approximates gradient decent by
updating weights incrementally
 Calculate error for each example

INCREMENTAL GRADIENT DESCENTE
Gradient-Descent(training_examples, )
Each training example is a pair of the form <(x1,…xn),t> where (x1,…,xn) is
the vector of input values, and t is the target output value,  is the learning
rate (e.g. 0.1)
Initialize each wi to some small random value
Until the termination condition is met
Do
Initialize each wi to zero
For each <(x1,…xn),t> in training_examples
Do
Input the instance (x1,…,xn) to the linear unit and compute the
output o
wi=wi+wi
wi (t o)xi

PERCEPTRON LEARNING:
LOGICAL OPERATION AND
Inputs
x1 x2
0
0
1
1
0
1
0
1
0
0
0
Epoch
Desired
output
Yd
1
Initial
weights
w1 w2
1
0.3 0.1
Actual
output
Y
Error
e
Final
weights
w1 w2
0
0
1
1
0
1
0
1
0
0
0
2
1
0
0
1
1
0
1
0
1
0
0
0
3
1
0
0
1
1
0
1
0
1
0
0
0
4
1
0
0
1
1
0
1
0
1
0
0
0
5
1
Threshold:  = 0.2; learning rate:  = 0.1
𝑤1𝑥1 + 𝑤2𝑥2 − 𝜃 = 0
𝑤1𝑥1 + 𝑤2𝑥2 − 0.2 = 0

Inputs
x1 x2
0
0
1
1
0
1
0
1
0
0
0
Epoch
Desired
output
Yd
1
Initial
weights
w1 w2
1
0.3 0.1 0
Actual
output
Y
Error
e
0
Final
weights
w1 w2
0.3 0.1
0
0
1
1
0
1
0
1
0
0
0
2
1
0
0
1
1
0
1
0
1
0
0
0
3
1
0
0
1
1
0
1
0
1
0
0
0
4
1
0
0
1
1
0
1
0
1
0
0
0
5
1
𝑤1𝑥1 + 𝑤2𝑥2 − 0.2 = 0
= 0.3 x 0 − 0.1 x 0 − 0.2
= − 0.2 < 0
Output: 0
Update Rule:
𝑤𝑖 = 𝑤𝑖 + 𝜂 𝑡 − 𝑜 𝑥𝑖
𝑤1 = 𝑤1 + 0.1 𝑡 − 𝑜 𝑥1
𝑤1 = 0.3 + 0.1 0 − 0 0
𝑤1 = 0.3
Training Example 1:
𝑤2 = 𝑤2 + 0.1 𝑡 − 𝑜 𝑥2
𝑤2 = −0.1 + 0.1 0 − 0 0
𝑤2 = −0.1

Inputs
x1 x2
0
0
1
1
0
1
0
1
0
0
0
Epoch
Desired
output
Yd
1
Initial
weights
w1 w2
1
0.3 0.1 0
Actual
output
Y
Error
e
0
Final
weights
w1 w2
0.3 0.1
0
0
1
1
0
1
0
1
0
0
0
2
1
0
0
1
1
0
1
0
1
0
0
0
3
1
0
0
1
1
0
1
0
1
0
0
0
4
1
0
0
1
1
0
1
0
1
0
0
0
5
1
0.3 0.1
𝑤1𝑥1 + 𝑤2𝑥2 − 0.2 = 0
= 0.3 x 0 − 0.1 x 0 − 0.2
= − 0.2 < 0
Output: 0
Update Rule:
𝑤1 = 𝑤1 + 0.1 𝑡 − 𝑜 𝑥1
𝑤1 = 0.3 + 0.1 0 − 0 0
𝑤1 = 0.3
Training Example 1:
𝑤2 = 𝑤2 + 0.1 𝑡 − 𝑜 𝑥2
𝑤2 = −0.1 + 0.1 0 − 0 0
𝑤2 = −0.1

Inputs
x1 x2
0
0
1
1
0
1
0
1
0
0
0
Epoch
Desired
output
Yd
1
Initial
weights
w1 w2
1
0.3 0.1 0
Actual
output
Y
Error
e
0
Final
weights
w1 w2
0.3 0.1
0
0
1
1
0
1
0
1
0
0
0
2
1
0
0
1
1
0
1
0
1
0
0
0
3
1
0
0
1
1
0
1
0
1
0
0
0
4
1
0
0
1
1
0
1
0
1
0
0
0
5
1
0.3 0.1 0 0 0.3 0.1
𝑤1𝑥1 + 𝑤2𝑥2 − 0.2 = 0
= 0.3 x 0 − 0.1 x 1 − 0.2
= − 0.3 < 0
Output: 0
Update Rule:
𝑤1 = 𝑤1 + 0.1 𝑡 − 𝑜 𝑥1
𝑤1 = 0.3 + 0.1 0 − 0 0
𝑤1 = 0.3
Training Example 2:
𝑤2 = 𝑤2 + 0.1 𝑡 − 𝑜 𝑥2
𝑤2 = −0.1 + 0.1 0 − 0 1
𝑤2 = −0.1

Inputs
x1 x2
0
0
1
1
0
1
0
1
0
0
0
Epoch
Desired
output
Yd
1
Initial
weights
w1 w2
1
0.3 0.1 0
Actual
output
Y
Error
e
0
Final
weights
w1 w2
0.3 0.1
0
0
1
1
0
1
0
1
0
0
0
2
1
0
0
1
1
0
1
0
1
0
0
0
3
1
0
0
1
1
0
1
0
1
0
0
0
4
1
0
0
1
1
0
1
0
1
0
0
0
5
1
0.3 0.1 0 0 0.3 0.1
0.3 0.1
𝑤1𝑥1 + 𝑤2𝑥2 − 0.2 = 0
= 0.3 x 0 − 0.1 x 1 − 0.2
= − 0.3 < 0
Output: 0
Update Rule:
𝑤1 = 𝑤1 + 0.1 𝑡 − 𝑜 𝑥1
𝑤1 = 0.3 + 0.1 0 − 0 0
𝑤1 = 0.3
Training Example 2:
𝑤2 = 𝑤2 + 0.1 𝑡 − 𝑜 𝑥2
𝑤2 = −0.1 + 0.1 0 − 0 1
𝑤2 = −0.1

Inputs
x1 x2
0
0
1
1
0
1
0
1
0
0
0
Epoch
Desired
output
Yd
1
Initial
weights
w1 w2
1
0.3 0.1 0
Actual
output
Y
Error
e
0
Final
weights
w1 w2
0.3 0.1
0
0
1
1
0
1
0
1
0
0
0
2
1
0
0
1
1
0
1
0
1
0
0
0
3
1
0
0
1
1
0
1
0
1
0
0
0
4
1
0
0
1
1
0
1
0
1
0
0
0
5
1
0.3 0.1 0 0 0.3 0.1
0.3 0.1
𝑤1𝑥1 + 𝑤2𝑥2 − 0.2 = 0
= 0.3 x 1 − 0.1 x 0 − 0.2
= 0.1 > 0
Output: 1
Update Rule:
𝑤1 = 𝑤1 + 0.1 𝑡 − 𝑜 𝑥1
𝑤1 = 0.3 + 0.1 0 − 1 1
𝑤1 = 0.2
Training Example 3:
𝑤2 = 𝑤2 + 0.1 𝑡 − 𝑜 𝑥2
𝑤2 = −0.1 + 0.1 0 − 1 0
𝑤2 = −0.1

Inputs
x1 x2
0
0
1
1
0
1
0
1
0
0
0
Epoch
Desired
output
Yd
1
Initial
weights
w1 w2
1
0.3 0.1 0
Actual
output
Y
Error
e
0
Final
weights
w1 w2
0.3 0.1
0
0
1
1
0
1
0
1
0
0
0
2
1
0
0
1
1
0
1
0
1
0
0
0
3
1
0
0
1
1
0
1
0
1
0
0
0
4
1
0
0
1
1
0
1
0
1
0
0
0
5
1
0.3 0.1 0 0 0.3 0.1
0.3 0.1 1 -1 0.2 0.1
𝑤1𝑥1 + 𝑤2𝑥2 − 0.2 = 0
= 0.3 x 1 − 0.1 x 0 − 0.2
= 0.1 > 0
Output: 1
Update Rule:
𝑤1 = 𝑤1 + 0.1 𝑡 − 𝑜 𝑥1
𝑤1 = 0.3 + 0.1 0 − 1 1
𝑤1 = 0.2
Training Example 3:
𝑤2 = 𝑤2 + 0.1 𝑡 − 𝑜 𝑥2
𝑤2 = −0.1 + 0.1 0 − 1 0
𝑤2 = −0.1

Inputs
x1 x2
0
0
1
1
0
1
0
1
0
0
0
Epoch
Desired
output
Yd
1
Initial
weights
w1 w2
1
0.3 0.1 0
Actual
output
Y
Error
e
0
Final
weights
w1 w2
0.3 0.1
0
0
1
1
0
1
0
1
0
0
0
2
1
0
0
1
1
0
1
0
1
0
0
0
3
1
0
0
1
1
0
1
0
1
0
0
0
4
1
0
0
1
1
0
1
0
1
0
0
0
5
1
0.3 0.1 0 0 0.3 0.1
0.3 0.1 1 -1 0.2 0.1
0.2 0.1
𝑤1𝑥1 + 𝑤2𝑥2 − 0.2 = 0
= 0.3 x 1 − 0.1 x 0 − 0.2
= 0.1 > 0
Output: 1
Update Rule:
𝑤1 = 𝑤1 + 0.1 𝑡 − 𝑜 𝑥1
𝑤1 = 0.3 + 0.1 0 − 1 1
𝑤1 = 0.2
Training Example 3:
𝑤2 = 𝑤2 + 0.1 𝑡 − 𝑜 𝑥2
𝑤2 = −0.1 + 0.1 0 − 1 0
𝑤2 = −0.1

Inputs
x1 x2
0
0
1
1
0
1
0
1
0
0
0
Epoch
Desired
output
Yd
1
Initial
weights
w1 w2
1
0.3
0.3
0.3
0.2
0.1
0.1
0.1
0.1
0
0
1
0
Actual
output
Y
Error
e
0
0
1
1
Final
weights
w1 w2
0.3
0.3
0.2
0.3
0.1
0.1
0.1
0.0
0
0
1
1
0
1
0
1
0
0
0
2
1
0.3
0.3
0.3
0.2
0
0
1
1
0
0
1
0
0.3
0.3
0.2
0.2
0.0
0.0
0.0
0.0
0
0
1
1
0
1
0
1
0
0
0
3
1
0.2
0.2
0.2
0.1
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0
0
1
0
0
0
1
1
0.2
0.2
0.1
0.2
0.0
0.0
0.0
0.1
0
0
1
1
0
1
0
1
0
0
0
4
1
0.2
0.2
0.2
0.1
0.1
0.1
0.1
0.1
0
0
1
1
0
0
1
0
0.2
0.2
0.1
0.1
0.1
0.1
0.1
0.1
0
0
1
1
0
1
0
1
0
0
0
5
1
0.1
0.1
0.1
0.1
0.1
0.1
0.1
0.1
0
0
0
1
0
0
0
0.1
0.1
0.1
0.1
0.1
0.1
0.1
0.1
0

0.3𝑥1 − 0.1𝑋2 − 0.2 = 0 0.2𝑥1 − 0.1𝑥2 − 0.2 = 0
0.3𝑥1 − 0.2 = 0 0.2𝑥1 − 0.2 = 0

0.1𝑥1 − 0.2 = 0 0.2𝑥1 + 0.1𝑥2 − 0.2 = 0
0.1𝑥1 + 0.1𝑥2 − 0.2 = 0

[Russell & Norvig, 1995]
XOR - REVISITED
 Piece-wise linear separation
0,0
0,1
1,0
1,1
0,0
0,1
1,0
1,1
AND XOR

0,0
0,1
1,0
1,1
XOR
0,0
0,1
1,0
1,1
XOR
0,0
0,1
1,0
1,1
XOR
1
2
3

MULTI-LAYER PERCEPTRON - MLP
 Minsky & Papert (1969) offered solution to XOR
problem by combining perceptron unit responses
using a second layer of Units
 Piecewise linear classification using an MLP with
threshold (perceptron) units

Artificial neural networks - A gentle introduction to ANNS.pptx

Recommended

Recommended

More Related Content

Similar to Artificial neural networks - A gentle introduction to ANNS.pptx

Similar to Artificial neural networks - A gentle introduction to ANNS.pptx (20)

Recently uploaded

Recently uploaded (20)

Artificial neural networks - A gentle introduction to ANNS.pptx