3. WHAT IS IT?
An artificial neural network is a crude way of trying
to simulate the human brain (digitally)
Human brain – Approx 10 billion neurons
Each neuron connected with thousands of others
Parts of neuron
Cell body
Dendrites – receive input signal
Axons – Give output
4. INTRODUCTION
ANN – made up of artificial neurons
Digitally modeled biological neuron
Each input into the neuron has its own weight
associated with it
As each input enters the nucleus (blue circle) it's
multiplied by its weight.
5. INTRODUCTION
The nucleus sums all these new input values which
gives us the activation
For n inputs and n weights – weights multiplied by
input and summed
a = x1w1+x2w2+x3w3... +xnwn
6. INTRODUCTION
If the activation is greater than a threshold value - the
neuron outputs a signal – (for example 1)
If the activation is less than threshold the neuron outputs
zero.
This is typically called a step function
7. INTRODUCTION
The combination of summation and thresholding is
called a node
For step (activation) function – The output is 1 if:
http://www-cse.uta.edu/~cook/ai1/lectures/figures/neuron.jpg
x1w1+x2w2+x3w3... +xnwn > T
8. INTRODUCTION
x1w1+x2w2+x3w3... +xnwn > T
x1w1+x2w2+x3w3... +xnwn -T > 0
Let w0 = -T and x0 = 1
D = x0w0 + x1w1+x2w2+x3w3... +xnwn > 0
Output is 1 if D> 0;
Output is 0 otherwise
w0 is called a bias weight
9. TYPICAL ACTIVATION FUNCTIONS
Step function Sign function
+1
-1
0
+1
-1
0
X
Y
X
Y
1 1
-1
0 X
Y
Sigmoidfunction
-1
0 X
Y
Linear function
0
if
,
0
0
if
,
1
X
X
Ystep
0
if
,
1
0
if
,
1
X
X
Ysign
X
sigmoid
e
Y
1
1
X
Ylinear
Controls when unit is “active” or “inactive”
10. AN ARTIFICIAL NEURON- SUMMARY SO FAR
Receives n-inputs
Multiplies each input by
its weight
Applies activation
function to the sum of
results
Outputs result
http://www-cse.uta.edu/~cook/ai1/lectures/figures/neuron.jpg
12. A MOTIVATING EXAMPLE
Each day you get lunch at the cafeteria.
Your diet consists of fish, chips, and drink.
You get several portions of each
The cashier only tells you the total price of the meal
After several days, you should be able to figure out the price of each
portion.
Each meal price gives a linear constraint on the prices of the
portions:
drink
drink
chips
chips
fish
fish w
x
w
x
w
x
price
13. SOLVING THE PROBLEM
The prices of the portions are like the weights in of a linear
neuron.
We will start with guesses for the weights and then adjust the
guesses to give a better fit to the prices given by the cashier.
)
( ,
, drink
chips
fish w
w
w
w
14. THE CASHIER’S BRAIN
Price of meal = 850
portions of fish portions of
chips
portions of
drink
150 50 100
2 5 3
Linear
neuron
15. Residual error = 350
Apply learning rules and
update weights
A MODEL OF THE CASHIER’S BRAIN
WITH ARBITRARY INITIAL WEIGHTS
Price of meal = 500
portions of fish portions of
chips
portions of
drink
50 50 50
2 5 3
16. PERCEPTRON
In 1958, Frank Rosenblatt introduced a training
algorithm that provided the first procedure for
training a simple ANN: a perceptron.
Threshold
Inputs
x1
x2
Output
Y
Hard
Limiter
w2
w1
Linear
Combiner
A two input perceptron
17. PERCEPTRON
A perceptron takes several inputs, x1, x2, ……, and
produces a single binary output.
The model consists of a linear combiner followed by a hard
limiter.
The weighted sum of the inputs is applied to the hard limiter,
which produces an output equal to +1 if its input is positive
and -1 if it is negative. (1/0 in some models).
𝑦 = 𝑠𝑔𝑛
𝑖=1
2
𝑤𝑖𝑥𝑖 + 𝜃
𝑠𝑔𝑛 𝑠 =
1 𝑖𝑓 𝑠 > 0
−1 𝑂𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
20. PERCEPTRON LEARNING
A perceptron (threshold unit) can learn anything
that it can represent (i.e. anything separable with a
hyperplane)
X1 X2 Y
0 0 0
0 1 1
1 0 1
1 1 1
21. 21
OR FUNCTION
The two-input perceptron can implement the OR function when
we set the weights: w0 = -0.3, w1 = w2 = 0.5
Decision hyperplane :
w0 + w1 x1 + w2 x2 = 0
-0.3 + 0.5 x1 + 0.5 x2 = 0
-
+
+
+
x1
x2
-0.3 + 0.5 x1 + 0.5 x2 = 0
-
+
+
+
x1
x2
-0.3 + 0.5 x1 + 0.5 x2 = 0
X1 X2 Y
0 0
0 1
1 0
1 1
Training Data
-1
+1
+1
+1
23. 23
A SINGLE PERCEPTRON CAN BE USED TO REPRESENT MANY
BOOLEAN FUNCTIONS.
AND FUNCTION :
Decision hyperplane :
w0 + w1 x1 + w2 x2 = 0
-0.8 + 0.5 x1 + 0.5 x2 = 0
-
-
-
+
x1
x2
-0.8 + 0.5 x1 + 0.5 x2 = 0
-
-
-
+
x1
x2
-0.8 + 0.5 x1 + 0.5 x2 = 0
X1 X2 Y
0 0 -1
0 1 -1
1 0 -1
1 1 +1
X1 X2
𝑾𝒊𝒙𝒊
Y
0 0 -0.8 -1
0 1 -0.3 -1
1 0 -0.3 -1
1 1 0.2 +1
Training Examples
Test Results
24. XOR FUNCTION
A Perceptron cannot represent Exclusive OR since
it is not linearly separable.
X1 X2 Y
0 0 -1
0 1 +1
1 0 +1
1 1 -1
XOR Function
25. 25
XOR FUNCTION :
It is impossible to implement the XOR function by a
single perceptron
Two perceptrons?
X1 X2 Y
0 0 -1
0 1 +1
1 0 +1
1 1 -1
XOR Function
26. 2D PLOT OF BASIC LOGICAL OPERATORS
x1
x2
1
1
x1
x2
1
1
(b) OR (x1 x2)
x1
x2
1
1
(c) Exclusive-OR
(x1 x2)
0
0 0
(a) AND (x1 x2)
A perceptron can learn the operations
AND and OR, but not Exclusive-OR.
27. PERCEPTRON
The aim of the perceptron is to classify inputs, x1,
x2, . . ., xn, into one of two classes, say A1 and A2.
In the case of an elementary perceptron, the n-
dimensional space is divided by a hyperplane into
two decision regions. The hyperplane is defined by
the function:
0
1
n
i
i
i w
x
28. LINEAR SEPARABILITY WITH PERCEPTRON
x1
x2
Class A2
Class A1
1
2
x1w1 + x2w2 = 0
(a) Two-input perceptron. (b) Three-
input perceptron.
x2
x1
x3
x1w1 + x2w2 + x3w3 = 0
1
2
31. TRAINING RULE DERIVATION – GRADIENT
DESCENT
Objective: Find the values of weights which minimize
the error function
O(d) is the observed and T(d) is the target output for training example ‘d’
d
n
n
d
d
d
m
d
d
d
x
w
x
w
x
w
w
O
O
T
E
....
)
(
2
1
2
2
1
1
0
)
(
2
1
)
(
)
(
32. BATCH GRADIENT DESCENTE
Gradient-Descent(training_examples, )
Each training example is a pair of the form <(x1,…xn),t> where (x1,…,xn) is the vector
of input values, and t is the target output value, is the learning rate (e.g. 0.1)
Initialize each wi to some small random value
Until the termination condition is met
Do
Initialize each wi to zero
For each <(x1,…xn),t> in training_examples
Do
Input the instance (x1,…,xn) to the linear unit and compute the output o
For each linear unit weight wi
Do
wi= wi + (t-o) xi
For each linear unit weight wi
Do
wi=wi+wi
wi (td
d D
od )xid
33. INCREMENTAL GRADIENT DESCENTE
The gradient decent training rule updates summing over
all the training examples
Stochastic gradient approximates gradient decent by
updating weights incrementally
Calculate error for each example
34. INCREMENTAL GRADIENT DESCENTE
Gradient-Descent(training_examples, )
Each training example is a pair of the form <(x1,…xn),t> where (x1,…,xn) is
the vector of input values, and t is the target output value, is the learning
rate (e.g. 0.1)
Initialize each wi to some small random value
Until the termination condition is met
Do
Initialize each wi to zero
For each <(x1,…xn),t> in training_examples
Do
Input the instance (x1,…,xn) to the linear unit and compute the
output o
For each linear unit weight wi
wi=wi+wi
wi (t o)xi
49. MULTI-LAYER PERCEPTRON - MLP
Minsky & Papert (1969) offered solution to XOR
problem by combining perceptron unit responses
using a second layer of Units
Piecewise linear classification using an MLP with
threshold (perceptron) units