SlideShare a Scribd company logo
1 of 63
Carla P. Gomes
CS4700
Basic Concepts
Neural Network
Input 0 Input 1 Input n
...
Output 0 Output 1 Output m
...
A Neural Network maps a set of inputs to a
set of outputs
Number of inputs/outputs is variable
The Network itself is composed of an
arbitrary number of nodes or units, connected
by links, with an arbitrary topology.
A link from unit i to unit j serves to propagate
the activation aj to j, and it has a weight
Wij.
What can a neural networks do?
Compute a known function / Approximate an unknown function
Pattern Recognition / Signal Processing
Learn to do any of the above
Carla P. Gomes
CS4700
Different
types of nodes
Carla P. Gomes
CS4700
Output edges,
each with weights
(positive, negative, and
change over time,
learning)
Input edges,
each with weights
(positive, negative, and
change over time,
learning)
An Artificial Neuron
Node or Unit:
A Mathematical Abstraction
Artificial Neuron,
Node or unit ,
Processing Unit i
j
n
j
i
j
i a
W
in 


0
,
Input
function(ini):
weighted sum
of its inputs,
including
fixed input a0.
 a processing element producing an output based on a function of its inputs
Activation
function (g)
applied to
input function
(typically
non-linear).
Output
)
(
0
, j
n
j
i
j
i a
W
g
a 


Note: the fixed input and bias weight are conventional; some authors instead, e.g., or a0=1 and -W0i
Carla P. Gomes
CS4700
(a) Threshold activation function  a step function or threshold function
(outputs 1 when the input is positive; 0 otherwise).
(b) Sigmoid (or logistics function) activation function (key advantage:
differentiable)
(c) Sign function, +1 if input is positive, otherwise -1.
Activation Functions
 Changing the bias weight W0,i moves the threshold location.
These functions have a threshold (either hard or soft) at zero.
Carla P. Gomes
CS4700
Threshold Activation Function
Input edges,
each with weights
(positive, negative, and
change over time,
learning)
;
0
0
, 
 

j
n
j
i
j
i a
W
in
i=0 i=t
i
i
i
j
n
j
i
j w
w
a
W
get
we
a
defining ,
0
,
0
1
,
0 ,
1 


 


;
0
0
,
0
1
, 


 

a
w
a
w
in i
j
n
j
i
j
i
i threshold value
associated with
unit i
i
i
i
j
n
j
i
j w
w
a
W
get
we
a
defining ,
0
,
0
1
,
0 ,
1 



 


Carla P. Gomes
CS4700
i
n
j
j
i
j W
a
W ,
0
1
, 


Implementing Boolean Functions
Units with a threshold activation function
can act as logic gates; we can use these units
to compute Boolean function of its inputs.
Activation of
threshold units when:
Carla P. Gomes
CS4700
Boolean AND
input x1 input x2 ouput
0 0 0
0 1 0
1 0 0
1 1 1
x2
x1
w2=1
w1=1
W0= 1.5
-1
i
n
j
j
i
j W
a
W ,
0
1
, 


Activation of
threshold units when:
Carla P. Gomes
CS4700
Boolean OR
input x1 input x2 ouput
0 0 0
0 1 1
1 0 1
1 1 1
x2
x1
w2=1
w1=1
w0= 0.5
-1
i
n
j
j
i
j W
a
W ,
0
1
, 


Activation of
threshold units when:
Carla P. Gomes
CS4700
Inverter
input x1 output
0 1
1 0
x1
w1= 1
-1
w0= -0.5
i
n
j
j
i
j W
a
W ,
0
1
, 


Activation of
threshold units when:
So, units with a threshold activation function
can act as logic gates given the appropriate input and
bias weights.
Carla P. Gomes
CS4700
Network Structures
Acyclic or Feed-forward networks
Activation flows from input layer to
output layer
– single-layer perceptrons
– multi-layer perceptrons
Recurrent networks
– Feed the outputs back into own inputs
Network is a dynamical system
(stable state, oscillations, chaotic behavior)
Response of the network depends on initial state
– Can support short-term memory
– More difficult to understand
Feed-forward networks implement functions,
have no internal state (only weights).
Our focus
Carla P. Gomes
CS4700
Feed-forward Network:
Represents a function of Its Input
Two hidden units
Two input units One Output
By adjusting the weights we get different functions:
that is how learning is done in neural networks!
Each unit receives input only
from units in the immediately
preceding layer.
Given an input vector x = (x1,x2), the activations of the input units are set to values of the
input vector, i.e., (a1,a2)=(x1,x2), and the network computes:
Feed-forward network computes a parameterized family of functions hW(x)
(Bias unit omitted
for simplicity)
Note: the input layer in general does not include computing units.
Weights are the parameters of the function
Carla P. Gomes
CS4700
Perceptron
Perceptron
– Invented by Frank Rosenblatt in 1957 in an
attempt to understand human memory, learning,
and cognitive processes.
– The first neural network model by computation,
with a remarkable learning algorithm:
• If function can be represented by perceptron, the
learning algorithm is guaranteed to quickly
converge to the hidden function!
– Became the foundation of pattern recognition
research
Rosenblatt &
Mark I Perceptron:
the first machine that could
"learn" to recognize and
identify optical patterns.
Cornell Aeronautical Laboratory
One of the earliest and most influential neural networks:
An important milestone in AI.
Carla P. Gomes
CS4700
Perceptron
ROSENBLATT, Frank.
(Cornell Aeronautical Laboratory at Cornell
University )
The Perceptron: A Probabilistic Model for
Information Storage and Organization in the Brain.
In, Psychological Review, Vol. 65, No. 6, pp. 386-
408, November, 1958.
Carla P. Gomes
CS4700
Single Layer Feed-forward Neural Networks
Perceptrons
Single-layer neural network (perceptron network)
A network with all the inputs connected directly to the outputs
Since each output unit is
independent of the others,
we can limit our study
to single output perceptrons.
–Output units all operate separately: no shared weights
Carla P. Gomes
CS4700
Perceptron to Learn to Identify Digits
(From Pat. Winston, MIT)
Seven line segments
are enough to produce
all 10 digits
5
3
1
4
6
0
2
Digit x0 x1 x2 x3 x4 x5 x6
0 0 1 1 1 1 1 1
9 1 1 1 1 1 1 0
8 1 1 1 1 1 1 1
7 0 0 1 1 1 0 0
6 1 1 1 0 1 1 1
5 1 1 1 0 1 1 0
4 1 1 0 1 1 0 0
3 1 0 1 1 1 1 0
2 1 0 1 1 0 1 1
1 0 0 0 1 1 0 0
Carla P. Gomes
CS4700
Perceptron to Learn to Identify Digits
(From Pat. Winston, MIT)
Seven line segments
are enough to produce
all 10 digits
A vision system reports which of the seven segments
in the display are on, therefore producing the inputs
for the perceptron.
5
3
1
4
6
0
2
Carla P. Gomes
CS4700
Perceptron to Learn to Identify Digit 0
Seven line segments
are enough to produce
all 10 digits
A vision system reports which of the seven segments
in the display are on, therefore producing the inputs for the perceptron.
5
3
1
4
6
0
2
0
0
0
0
0
-1
1
0
Digit x0 x1 x2 x3 x4 x5 x6 X7
(fixed
input)
0 0 1 1 1 1 1 1 1
Sum>0  output=1
Else output=0
When the input digit is 0,
what’s the value of
sum?
Carla P. Gomes
CS4700
Single Layer Feed-forward Neural Networks
Perceptrons
Two input perceptron unit with a sigmoid (logistics) activation function:
adjusting weights moves the location, orientation, and steepness of cliff
Carla P. Gomes
CS4700
Perceptron Learning:
Intuition
Weight Update
 Input Ij (j=1,2,…,n)
 Single output O: target output, T.
Consider some initial weights
Define example error: Err = T – O
Now just move weights in right direction!
If the error is positive, then we need to increase O.
Err >0  need to increase O;
Err <0  need to decrease O;
Each input unit j, contributes Wj Ij to total input:
if Ij is positive, increasing Wj tends to increase O;
if Ij is negative, decreasing Wj tends to increase O;
So, use:
Wj  Wj +   Ij  Err
Perceptron Learning Rule (Rosenblatt 1960)
 is the learning rate.
Carla P. Gomes
CS4700
Let’s consider an example (adapted from Patrick Wintson book, MIT)
Framework and notation:
0/1 signals
Input vector:
Weight vector:
x0 = 1 and 0=-w0, simulate the threshold.
O is output (0 or 1) (single output).
Threshold function:
Perceptron Learning:
Simple Example



n
x
x
x
x
X ,
,
, 2
1
0 



n
w
w
w
w
W ,
,
, 2
1
0 
0
1
0
0



 


O
else
O
then
S
x
w
S
n
k
k
k
k
Learning rate = 1.
Carla P. Gomes
CS4700
Set of examples, each example is a pair
i.e., an input vector and a label y (0 or 1).
Learning procedure, called the “error correcting method”
• Start with all zero weight vector.
• Cycle (repeatedly) through examples and for each example do:
– If perceptron is 0 while it should be 1,
add the input vector to the weight vector
– If perceptron is 1 while it should be 0,
subtract the input vector to the weight vector
– Otherwise do nothing.
Perceptron Learning:
Simple Example
Wj  Wj +   Ij  Err
Err = T – O
)
,
( i
i y
x

Intuitively correct,
(e.g., if output is 0
but it should be 1,
the weights are
increased) !
This procedure provably converges
(polynomial number of steps)
if the function is represented
by a perceptron
(i.e., linearly separable)
Carla P. Gomes
CS4700
Perceptron Learning:
Simple Example
Consider learning the logical OR function.
Our examples are:
Sample x0 x1 x2 label
1 1 0 0 0
2 1 0 1 1
3 1 1 0 1
4 1 1 1 1
0
1
0
0



 


O
else
O
then
S
x
w
S
n
k
k
k
k
Activation Function
Carla P. Gomes
CS4700
Perceptron Learning:
Simple Example
We’ll use a single perceptron with three inputs.
We’ll start with all weights 0 W= <0,0,0>
Example 1 I= < 1 0 0> label=0 W= <0,0,0>
Perceptron (10+ 00+ 00 =0, S=0) output  0
it classifies it as 0, so correct, do nothing
Example 2 I=<1 0 1> label=1 W= <0,0,0>
Perceptron (10+ 00+ 10 = 0) output 0
it classifies it as 0, while it should be 1, so we add input to weights
W = <0,0,0> + <1,0,1>= <1,0,1>
I0
I1
I2
O
w0
w1
w2
0
1
0
0



 


O
else
O
then
S
x
w
S
n
k
k
k
k
1
If perceptron is 0 while it should be 1,
add the input vector to the weight vector
If perceptron is 1 while it should be 0,
subtract the input vector to the weight vector
Otherwise do nothing.
Error correcting method
Carla P. Gomes
CS4700
Example 3 I=<1 1 0> label=1 W= <1,0,1>
Perceptron (10+ 10+ 00 > 0) output = 1
it classifies it as 1, correct, do nothing
W = <1,0,1>
Example 4 I=<1 1 1> label=1 W= <1,0,1>
Perceptron (10+ 10+ 10 > 0) output = 1
it classifies it as 1, correct, do nothing
W = <1,0,1>
I0
I1
I2
O
w0
w1
w2
1
Carla P. Gomes
CS4700
Perceptron Learning:
Simple Example
Epoch 2, through the examples, W = <1,0,1> .
Example 1 I = <1,0,0> label=0 W = <1,0,1>
Perceptron (11+ 00+ 01 >0) output  1
it classifies it as 1, while it should be 0, so subtract input from weights
W = <1,0,1> - <1,0,0> = <0, 0, 1>
Example 2 I=<1 0 1> label=1 W= <0,0,1>
Perceptron (10+ 00+ 11 > 0) output 1
it classifies it as 1, so correct, do nothing
I0
I1
I2
O
w0
w1
w2
If perceptron is 0 while it should be 1,
add the input vector to the weight vector
If perceptron is 1 while it should be 0,
subtract the input vector from the weight vector
Otherwise do nothing.
Error correcting method
1
Carla P. Gomes
CS4700
Example 3 I=<1 1 0> label=1 W= <0,0,1>
Perceptron (10+ 10+ 01 > 0) output = 0
it classifies it as 0, while it should be 1, so add input to weights
W = <0,0,1> + W = <1,1,0> = <1, 1, 1>
Example 4 I=<1 1 1> label=1 W= <1,1,1>
Perceptron (11+ 11+ 11 > 0) output = 1
it classifies it as 1, correct, do nothing
W = <1,1,1>
Carla P. Gomes
CS4700
Perceptron Learning:
Simple Example
Epoch 3, through the examples, W = <1,1,1> .
Example 1 I=<1,0,0> label=0 W = <1,1,1>
Perceptron (11+ 01+ 01 >0) output  1
it classifies it as 1, while it should be 0, so subtract input from weights
W = <1,1,1> - W = <1,0,0> = <0, 1, 1>
Example 2 I=<1 0 1> label=1 W= <0, 1, 1>
Perceptron (10+ 01+ 11 > 0) output 1
it classifies it as 1, so correct, do nothing
I0
I1
I2
O
w0
w1
w2
1
Carla P. Gomes
CS4700
Example 3 I=<1 1 0> label=1 W= <0, 1, 1>
Perceptron (10+ 11+ 01 > 0) output = 1
it classifies it as 1, correct, do nothing
Example 4 I=<1 1 1> label=1 W= <0, 1, 1>
Perceptron (10+ 11+ 11 > 0) output = 1
it classifies it as 1, correct, do nothing
W = <1,1,1>
Carla P. Gomes
CS4700
Perceptron Learning:
Simple Example
Epoch 4, through the examples, W= <0, 1, 1>.
Example 1 I= <1,0,0> label=0 W = <0,1,1>
Perceptron (10+ 01+ 01 = 0) output  0
it classifies it as 0, so correct, do nothing
I0
I1
I2
O
W0 =0
W1=1
W2=1
OR
So the final weight vector W= <0, 1, 1> classifies all
examples correctly, and the perceptron has learned the function!
Aside: in more realistic cases the bias (W0) will not be 0.
(This was just a toy example!)
Also, in general, many more inputs (100 to 1000)
1
Carla P. Gomes
CS4700
Epoch x0 x1 x2 Desired
Target
w0 w1 w2 Output Error New
w0
New
w1
New
w2
1 example 1 1 0 0 0 0 0 0 0 0 0 0 0
1 0 1 1 0 0 0 0 1 1 0 1
1 1 0 1 1 0 1 1 0 1 0 1
1 1 1 1 1 0 1 1 0 1 0 1
2 1 0 0 0 1 0 1 1 -1 0 0 1
1 0 1 1 0 0 1 1 0 0 0 1
1 1 0 1 0 0 1 0 1 1 1 1
1 1 1 1 1 1 1 1 0 1 1 1
3 1 0 0 0 1 1 1 1 -1 0 1 1
1 0 1 1 0 1 1 1 0 0 1 1
1 1 0 1 0 1 1 1 0 0 1 1
1 1 1 1 0 1 1 1 0 0 1 1
4 1 0 0 0 0 1 1 0 0 0 1 1
Carla P. Gomes
CS4700
Epoch x0 x1 x2 Desired
Target
w0 w1 w2 Output Error New
w0
New
w1
New
w2
1 example 1 1 0 0 0 0 0 0 0 0 0 0 0
example 2 1 0 1 1 0 0 0 0 1 1 0 1
1 1 0 1 1 0 1 1 0 1 0 1
1 1 1 1 1 0 1 1 0 1 0 1
2 1 0 0 0 1 0 1 1 -1 0 0 1
1 0 1 1 0 0 1 1 0 0 0 1
1 1 0 1 0 0 1 0 1 1 1 1
1 1 1 1 1 1 1 1 0 1 1 1
3 1 0 0 0 1 1 1 1 -1 0 1 1
1 0 1 1 0 1 1 1 0 0 1 1
1 1 0 1 0 1 1 1 0 0 1 1
1 1 1 1 0 1 1 1 0 0 1 1
4 1 0 0 0 0 1 1 0 0 0 1 1
Carla P. Gomes
CS4700
Epoch x0 x1 x2 Desired
Target
w0 w1 w2 Output Error New
w0
New
w1
New
w2
1 example 1 1 0 0 0 0 0 0 0 0 0 0 0
example 2 1 0 1 1 0 0 0 0 1 1 0 1
example 3 1 1 0 1 1 0 1 1 0 1 0 1
1 1 1 1 1 0 1 1 0 1 0 1
2 1 0 0 0 1 0 1 1 -1 0 0 1
1 0 1 1 0 0 1 1 0 0 0 1
1 1 0 1 0 0 1 0 1 1 1 1
1 1 1 1 1 1 1 1 0 1 1 1
3 1 0 0 0 1 1 1 1 -1 0 1 1
1 0 1 1 0 1 1 1 0 0 1 1
1 1 0 1 0 1 1 1 0 0 1 1
1 1 1 1 0 1 1 1 0 0 1 1
4 1 0 0 0 0 1 1 0 0 0 1 1
Carla P. Gomes
CS4700
Epoch x0 x1 x2 Desired
Target
w0 w1 w2 Output Error New
w0
New
w1
New
w2
1 example 1 1 0 0 0 0 0 0 0 0 0 0 0
example 2 1 0 1 1 0 0 0 0 1 1 0 1
example 3 1 1 0 1 1 0 1 1 0 1 0 1
example 4 1 1 1 1 1 0 1 1 0 1 0 1
2 1 0 0 0 1 0 1 1 -1 0 0 1
1 0 1 1 0 0 1 1 0 0 0 1
1 1 0 1 0 0 1 0 1 1 1 1
1 1 1 1 1 1 1 1 0 1 1 1
3 1 0 0 0 1 1 1 1 -1 0 1 1
1 0 1 1 0 1 1 1 0 0 1 1
1 1 0 1 0 1 1 1 0 0 1 1
1 1 1 1 0 1 1 1 0 0 1 1
4 1 0 0 0 0 1 1 0 0 0 1 1
Carla P. Gomes
CS4700
Epoch x0 x1 x2 Desired
Target
w0 w1 w2 Output Error New
w0
New
w1
New
w2
1 example 1 1 0 0 0 0 0 0 0 0 0 0 0
example 2 1 0 1 1 0 0 0 0 1 1 0 1
example 3 1 1 0 1 1 0 1 1 0 1 0 1
example 4 1 1 1 1 1 0 1 1 0 1 0 1
2 example 1 1 0 0 0 1 0 1 1 -1 0 0 1
1 0 1 1 0 0 1 1 0 0 0 1
1 1 0 1 0 0 1 0 1 1 1 1
1 1 1 1 1 1 1 1 0 1 1 1
3 1 0 0 0 1 1 1 1 -1 0 1 1
1 0 1 1 0 1 1 1 0 0 1 1
1 1 0 1 0 1 1 1 0 0 1 1
1 1 1 1 0 1 1 1 0 0 1 1
4 1 0 0 0 0 1 1 0 0 0 1 1
Carla P. Gomes
CS4700
Epoch x0 x1 x2 Desired
Target
w0 w1 w2 Output Error New
w0
New
w1
New
w2
1 example 1 1 0 0 0 0 0 0 0 0 0 0 0
example 2 1 0 1 1 0 0 0 0 1 1 0 1
example 3 1 1 0 1 1 0 1 1 0 1 0 1
example 4 1 1 1 1 1 0 1 1 0 1 0 1
2 example 1 1 0 0 0 1 0 1 1 -1 0 0 1
example 2 1 0 1 1 0 0 1 1 0 0 0 1
1 1 0 1 0 0 1 0 1 1 1 1
1 1 1 1 1 1 1 1 0 1 1 1
3 1 0 0 0 1 1 1 1 -1 0 1 1
1 0 1 1 0 1 1 1 0 0 1 1
1 1 0 1 0 1 1 1 0 0 1 1
1 1 1 1 0 1 1 1 0 0 1 1
4 1 0 0 0 0 1 1 0 0 0 1 1
Carla P. Gomes
CS4700
Epoch x0 x1 x2 Desired
Target
w0 w1 w2 Output Error New
w0
New
w1
New
w2
1 example 1 1 0 0 0 0 0 0 0 0 0 0 0
example 2 1 0 1 1 0 0 0 0 1 1 0 1
example 3 1 1 0 1 1 0 1 1 0 1 0 1
example 4 1 1 1 1 1 0 1 1 0 1 0 1
2 example 1 1 0 0 0 1 0 1 1 -1 0 0 1
example 2 1 0 1 1 0 0 1 1 0 0 0 1
example 3 1 1 0 1 0 0 1 0 1 1 1 1
1 1 1 1 1 1 1 1 0 1 1 1
3 1 0 0 0 1 1 1 1 -1 0 1 1
1 0 1 1 0 1 1 1 0 0 1 1
1 1 0 1 0 1 1 1 0 0 1 1
1 1 1 1 0 1 1 1 0 0 1 1
4 1 0 0 0 0 1 1 0 0 0 1 1
Carla P. Gomes
CS4700
Epoch x0 x1 x2 Desired
Target
w0 w1 w2 Output Error New
w0
New
w1
New
w2
1 example 1 1 0 0 0 0 0 0 0 0 0 0 0
example 2 1 0 1 1 0 0 0 0 1 1 0 1
example 3 1 1 0 1 1 0 1 1 0 1 0 1
example 4 1 1 1 1 1 0 1 1 0 1 0 1
2 example 1 1 0 0 0 1 0 1 1 -1 0 0 1
example 2 1 0 1 1 0 0 1 1 0 0 0 1
example 3 1 1 0 1 0 0 1 0 1 1 1 1
example 4 1 1 1 1 1 1 1 1 0 1 1 1
3 1 0 0 0 1 1 1 1 -1 0 1 1
1 0 1 1 0 1 1 1 0 0 1 1
1 1 0 1 0 1 1 1 0 0 1 1
1 1 1 1 0 1 1 1 0 0 1 1
4 1 0 0 0 0 1 1 0 0 0 1 1
Carla P. Gomes
CS4700
Epoch x0 x1 x2 Desired
Target
w0 w1 w2 Output Error New
w0
New
w1
New
w2
1 example 1 1 0 0 0 0 0 0 0 0 0 0 0
example 2 1 0 1 1 0 0 0 0 1 1 0 1
example 3 1 1 0 1 1 0 1 1 0 1 0 1
example 4 1 1 1 1 1 0 1 1 0 1 0 1
2 example 1 1 0 0 0 1 0 1 1 -1 0 0 1
example 2 1 0 1 1 0 0 1 1 0 0 0 1
example 3 1 1 0 1 0 0 1 0 1 1 1 1
example 4 1 1 1 1 1 1 1 1 0 1 1 1
3 example 1 1 0 0 0 1 1 1 1 -1 0 1 1
1 0 1 1 0 1 1 1 0 0 1 1
1 1 0 1 0 1 1 1 0 0 1 1
1 1 1 1 0 1 1 1 0 0 1 1
4 1 0 0 0 0 1 1 0 0 0 1 1
Carla P. Gomes
CS4700
Epoch x0 x1 x2 Desired
Target
w0 w1 w2 Output Error New
w0
New
w1
New
w2
1 example 1 1 0 0 0 0 0 0 0 0 0 0 0
example 2 1 0 1 1 0 0 0 0 1 1 0 1
example 3 1 1 0 1 1 0 1 1 0 1 0 1
example 4 1 1 1 1 1 0 1 1 0 1 0 1
2 example 1 1 0 0 0 1 0 1 1 -1 0 0 1
example 2 1 0 1 1 0 0 1 1 0 0 0 1
example 3 1 1 0 1 0 0 1 0 1 1 1 1
example 4 1 1 1 1 1 1 1 1 0 1 1 1
3 example 1 1 0 0 0 1 1 1 1 -1 0 1 1
example 2 1 0 1 1 0 1 1 1 0 0 1 1
1 1 0 1 0 1 1 1 0 0 1 1
1 1 1 1 0 1 1 1 0 0 1 1
4 1 0 0 0 0 1 1 0 0 0 1 1
Carla P. Gomes
CS4700
Epoch x0 x1 x2 Desired
Target
w0 w1 w2 Output Error New
w0
New
w1
New
w2
1 example 1 1 0 0 0 0 0 0 0 0 0 0 0
example 2 1 0 1 1 0 0 0 0 1 1 0 1
example 3 1 1 0 1 1 0 1 1 0 1 0 1
example 4 1 1 1 1 1 0 1 1 0 1 0 1
2 example 1 1 0 0 0 1 0 1 1 -1 0 0 1
example 2 1 0 1 1 0 0 1 1 0 0 0 1
example 3 1 1 0 1 0 0 1 0 1 1 1 1
example 4 1 1 1 1 1 1 1 1 0 1 1 1
3 example 1 1 0 0 0 1 1 1 1 -1 0 1 1
example 2 1 0 1 1 0 1 1 1 0 0 1 1
example 3 1 1 0 1 0 1 1 1 0 0 1 1
1 1 1 1 0 1 1 1 0 0 1 1
4 1 0 0 0 0 1 1 0 0 0 1 1
Carla P. Gomes
CS4700
Epoch x0 x1 x2 Desired
Target
w0 w1 w2 Output Error New
w0
New
w1
New
w2
1 example 1 1 0 0 0 0 0 0 0 0 0 0 0
example 2 1 0 1 1 0 0 0 0 1 1 0 1
example 3 1 1 0 1 1 0 1 1 0 1 0 1
example 4 1 1 1 1 1 0 1 1 0 1 0 1
2 example 1 1 0 0 0 1 0 1 1 -1 0 0 1
example 2 1 0 1 1 0 0 1 1 0 0 0 1
example 3 1 1 0 1 0 0 1 0 1 1 1 1
example 4 1 1 1 1 1 1 1 1 0 1 1 1
3 example 1 1 0 0 0 1 1 1 1 -1 0 1 1
example 2 1 0 1 1 0 1 1 1 0 0 1 1
example 3 1 1 0 1 0 1 1 1 0 0 1 1
example 4 1 1 1 1 0 1 1 1 0 0 1 1
4 1 0 0 0 0 1 1 0 0 0 1 1
Carla P. Gomes
CS4700
Epoch x0 x1 x2 Desired
Target
w0 w1 w2 Output Error New
w0
New
w1
New
w2
1 example 1 1 0 0 0 0 0 0 0 0 0 0 0
example 2 1 0 1 1 0 0 0 0 1 1 0 1
example 3 1 1 0 1 1 0 1 1 0 1 0 1
example 4 1 1 1 1 1 0 1 1 0 1 0 1
2 example 1 1 0 0 0 1 0 1 1 -1 0 0 1
example 2 1 0 1 1 0 0 1 1 0 0 0 1
example 3 1 1 0 1 0 0 1 0 1 1 1 1
example 4 1 1 1 1 1 1 1 1 0 1 1 1
3 example 1 1 0 0 0 1 1 1 1 -1 0 1 1
example 2 1 0 1 1 0 1 1 1 0 0 1 1
example 3 1 1 0 1 0 1 1 1 0 0 1 1
example 4 1 1 1 1 0 1 1 1 0 0 1 1
4 example 1 1 0 0 0 0 1 1 0 0 0 1 1
Carla P. Gomes
CS4700
Threshold perceptrons have some advantages , in particular
 Simple learning algorithm that fits a threshold perceptron to any
linearly separable training set.
Key idea: Learn by adjusting weights to reduce error on training set.
 update weights repeatedly (epochs) for each example.
We’ll use:
Sum of squared errors (e.g., used in linear regression), classical error measure
Learning is an optimization search problem in weight space.
Derivation of a learning rule for
Perceptrons Minimizing Squared Errors
Carla P. Gomes
CS4700
Derivation of a learning rule for
Perceptrons Minimizing Squared Errors
Let S = {(xi, yi): i = 1, 2, ..., m} be a training set. (Note, x is a vector of
inputs, and y is the vector of the true outputs.)
Let hw be the perceptron classifier represented by the weight vector w.
Definition:
2
))
(
(
2
1
)
(
)
( x
x
x w
h
y
Error
Squared
E 


Carla P. Gomes
CS4700
The squared error for a single training example with input x and true output y is:
Where hw (x) is the output of the perceptron on the example and y is the true output
value.
We can use the gradient descent to reduce the squared error by calculating the
partial derivatives of E with respect to each weight.
Note: g’(in) derivative of the activation function. For sigmoid g’=g(1-g). For threshold perceptrons,
Where g’(n) is undefined, the original perceptron rule simply omitted it.
Derivation of a learning rule for
Perceptrons Minimizing Squared Errors
Carla P. Gomes
CS4700
Gradient descent algorithm  we want to reduce , E, for each weight wi , change
weight in direction of steepest descent:
Intuitively:
Err = y – hW(x) positive
output is too small  weights are increased for positive inputs and
decreased for negative inputs.
Err = y – hW(x) negative
 opposite
 learning rate
Wj  Wj +   Ij  Err
Carla P. Gomes
CS4700
Rule is intuitively correct!
Greedy Search:
Gradient descent through weight space!
Surprising proof of convergence:
Weight space has no local minima!
With enough examples, it will find the target function!
(provide  not too large)
Perceptron Learning:
Intuition
Carla P. Gomes
CS4700
Gradient descent in weight space
From T. M. Mitchell, Machine Learning
Carla P. Gomes
CS4700
Perceptron learning rule:
1. Start with random weights, w = (w1, w2, ... , wn).
2. Select a training example (x,y)  S.
3. Run the perceptron with input x and weights w to obtain g
4. Let  be the training rate (a user-set parameter).
5. Go to 2.
i
i
i
i
i
i
x
in
g
in
g
y
w
w
w
w
w
)
(
'
))
(
(
where
,
,








Epoch

cycle
through
the
examples
The stochastic gradient method selects examples randomly from the
training set rather than cycling through them.
Epochs are repeated until some stopping criterion is reached—
typically, that the weight changes have become very small.
Wj  Wj +   Ij  Err
Carla P. Gomes
CS4700
Perceptron Learning:
Gradient Descent Learning Algorithm
Carla P. Gomes
CS4700
Expressiveness of Perceptrons
Carla P. Gomes
CS4700
Expressiveness of Perceptrons
What hypothesis space can a perceptron represent?
Even more complex Booelan functions such as majority
function .
But can it represent any arbitrary Boolean function?
Carla P. Gomes
CS4700
Expressiveness of Perceptrons
A threshold perceptron returns 1 iff the weighted sum of its
inputs (including the bias) is positive, i.e.,:
I.e., iff the input is on one side of the hyperplane it defines.
Linear discriminant function or linear decision surface.
Weights determine slope and bias determines offset.
Perceptron  Linear Separator
Carla P. Gomes
CS4700
x1
x2
+
+
+
+
+
+
+
2
0
1
2
1
2
2
2
1
1
0 0
w
w
x
w
w
x
x
w
x
w
w







Can view trained network
as defining a “separation line”.
Linear Separability
Percepton used for classification
Consider example with two inputs, x1, x2:
What is its equation?
Carla P. Gomes
CS4700
Linear Separability
x1
x2


 
OR
Carla P. Gomes
CS4700
Linear Separability
x1
x2


 
AND
Carla P. Gomes
CS4700
Linear Separability
x1
x2


 
XOR
Carla P. Gomes
CS4700
Linear Separability
x1
x2


 
XOR
Minsky & Papert (1969)
Bad News: Perceptrons can only represent linearly separable functions.
Not linearly separable
Carla P. Gomes
CS4700
Consider a threshold perceptron for the logical XOR function (two inputs):
Our examples are:
x1 x2 label
1 0 0 0
2 1 0 1
3 0 1 1
4 1 1 0
Linear Separability:
XOR
T
x
w
x
w 
 2
2
1
1
Given our examples, we have the following inequalities
for the perceptron:
From (1) 0 + 0 ≤ T  T0
From (2) w1+ 0 > T  w1 > T
From (3) 0 + w2 > T  w2 > T
From (4) w1 + w2 ≤ T
w1 + w2 > 2T
contradiction
So, XOR is not linearly separable
Carla P. Gomes
CS4700
Convergence of Perceptron
Learning Algorithm
… training data linearly separable
… step size  sufficiently small
… no “hidden” units
Perceptron converges to a consistent function, if…
Carla P. Gomes
CS4700
Perceptron learns majority function easily,
DTL is hopeless
Carla P. Gomes
CS4700
DTL learns restaurant function easily,
perceptron cannot represent it
Carla P. Gomes
CS4700
Good news: Adding hidden layer allows more target
functions to be represented.
Minsky & Papert (1969)

More Related Content

What's hot

Applications in Machine Learning
Applications in Machine LearningApplications in Machine Learning
Applications in Machine LearningJoel Graff
 
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...Simplilearn
 
Machine learning basics
Machine learning basics Machine learning basics
Machine learning basics Akanksha Bali
 
Convolutional Neural Network (CNN)
Convolutional Neural Network (CNN)Convolutional Neural Network (CNN)
Convolutional Neural Network (CNN)Muhammad Haroon
 
Object detection
Object detectionObject detection
Object detectionSomesh Vyas
 
Artificial intelligence- Logic Agents
Artificial intelligence- Logic AgentsArtificial intelligence- Logic Agents
Artificial intelligence- Logic AgentsNuruzzaman Milon
 
Edge detection of video using matlab code
Edge detection of video using matlab codeEdge detection of video using matlab code
Edge detection of video using matlab codeBhushan Deore
 
Edge Detection and Segmentation
Edge Detection and SegmentationEdge Detection and Segmentation
Edge Detection and SegmentationA B Shinde
 
PR-207: YOLOv3: An Incremental Improvement
PR-207: YOLOv3: An Incremental ImprovementPR-207: YOLOv3: An Incremental Improvement
PR-207: YOLOv3: An Incremental ImprovementJinwon Lee
 
Problem solving agents
Problem solving agentsProblem solving agents
Problem solving agentsMegha Sharma
 
Random Forest Algorithm - Random Forest Explained | Random Forest In Machine ...
Random Forest Algorithm - Random Forest Explained | Random Forest In Machine ...Random Forest Algorithm - Random Forest Explained | Random Forest In Machine ...
Random Forest Algorithm - Random Forest Explained | Random Forest In Machine ...Simplilearn
 
Object detection presentation
Object detection presentationObject detection presentation
Object detection presentationAshwinBicholiya
 
Machine Learning and Applications
Machine Learning and ApplicationsMachine Learning and Applications
Machine Learning and ApplicationsGeeta Arora
 
Decision trees in Machine Learning
Decision trees in Machine Learning Decision trees in Machine Learning
Decision trees in Machine Learning Mohammad Junaid Khan
 
Simple Introduction to AutoEncoder
Simple Introduction to AutoEncoderSimple Introduction to AutoEncoder
Simple Introduction to AutoEncoderJun Lang
 

What's hot (20)

Neural networks
Neural networksNeural networks
Neural networks
 
Applications in Machine Learning
Applications in Machine LearningApplications in Machine Learning
Applications in Machine Learning
 
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...
 
Machine learning basics
Machine learning basics Machine learning basics
Machine learning basics
 
Problem Solving
Problem Solving Problem Solving
Problem Solving
 
Hopfield Networks
Hopfield NetworksHopfield Networks
Hopfield Networks
 
Convolutional Neural Network (CNN)
Convolutional Neural Network (CNN)Convolutional Neural Network (CNN)
Convolutional Neural Network (CNN)
 
Object detection
Object detectionObject detection
Object detection
 
Artificial intelligence- Logic Agents
Artificial intelligence- Logic AgentsArtificial intelligence- Logic Agents
Artificial intelligence- Logic Agents
 
Edge detection of video using matlab code
Edge detection of video using matlab codeEdge detection of video using matlab code
Edge detection of video using matlab code
 
Edge Detection and Segmentation
Edge Detection and SegmentationEdge Detection and Segmentation
Edge Detection and Segmentation
 
Yolo
YoloYolo
Yolo
 
PR-207: YOLOv3: An Incremental Improvement
PR-207: YOLOv3: An Incremental ImprovementPR-207: YOLOv3: An Incremental Improvement
PR-207: YOLOv3: An Incremental Improvement
 
Neural Networks
Neural NetworksNeural Networks
Neural Networks
 
Problem solving agents
Problem solving agentsProblem solving agents
Problem solving agents
 
Random Forest Algorithm - Random Forest Explained | Random Forest In Machine ...
Random Forest Algorithm - Random Forest Explained | Random Forest In Machine ...Random Forest Algorithm - Random Forest Explained | Random Forest In Machine ...
Random Forest Algorithm - Random Forest Explained | Random Forest In Machine ...
 
Object detection presentation
Object detection presentationObject detection presentation
Object detection presentation
 
Machine Learning and Applications
Machine Learning and ApplicationsMachine Learning and Applications
Machine Learning and Applications
 
Decision trees in Machine Learning
Decision trees in Machine Learning Decision trees in Machine Learning
Decision trees in Machine Learning
 
Simple Introduction to AutoEncoder
Simple Introduction to AutoEncoderSimple Introduction to AutoEncoder
Simple Introduction to AutoEncoder
 

Similar to Perceptron 2015.ppt

Deep learning simplified
Deep learning simplifiedDeep learning simplified
Deep learning simplifiedLovelyn Rose
 
Artificial Neural Networks (ANNs) focusing on the perceptron Algorithm.pptx
Artificial Neural Networks (ANNs) focusing on the perceptron Algorithm.pptxArtificial Neural Networks (ANNs) focusing on the perceptron Algorithm.pptx
Artificial Neural Networks (ANNs) focusing on the perceptron Algorithm.pptxMDYasin34
 
Neural network
Neural networkNeural network
Neural networkmarada0033
 
Neural network 20161210_jintaekseo
Neural network 20161210_jintaekseoNeural network 20161210_jintaekseo
Neural network 20161210_jintaekseoJinTaek Seo
 
Artificial Neural Network
Artificial Neural NetworkArtificial Neural Network
Artificial Neural Networkssuserab4f3e
 
Machine Learning.pdf
Machine Learning.pdfMachine Learning.pdf
Machine Learning.pdfnikola_tesla1
 
Perceptron (neural network)
Perceptron (neural network)Perceptron (neural network)
Perceptron (neural network)EdutechLearners
 
Perceptron Study Material with XOR example
Perceptron Study Material with XOR examplePerceptron Study Material with XOR example
Perceptron Study Material with XOR exampleGSURESHKUMAR11
 
lecture13-NN-basics.pptx
lecture13-NN-basics.pptxlecture13-NN-basics.pptx
lecture13-NN-basics.pptxAbijahRoseline1
 
lecture07.ppt
lecture07.pptlecture07.ppt
lecture07.pptbutest
 
SOFT COMPUTERING TECHNICS -Unit 1
SOFT COMPUTERING TECHNICS -Unit 1SOFT COMPUTERING TECHNICS -Unit 1
SOFT COMPUTERING TECHNICS -Unit 1sravanthi computers
 
soft computing
soft computingsoft computing
soft computingAMIT KUMAR
 
Deep learning: Mathematical Perspective
Deep learning: Mathematical PerspectiveDeep learning: Mathematical Perspective
Deep learning: Mathematical PerspectiveYounusS2
 

Similar to Perceptron 2015.ppt (20)

Deep learning simplified
Deep learning simplifiedDeep learning simplified
Deep learning simplified
 
Artificial Neural Networks (ANNs) focusing on the perceptron Algorithm.pptx
Artificial Neural Networks (ANNs) focusing on the perceptron Algorithm.pptxArtificial Neural Networks (ANNs) focusing on the perceptron Algorithm.pptx
Artificial Neural Networks (ANNs) focusing on the perceptron Algorithm.pptx
 
Lec10
Lec10Lec10
Lec10
 
DNN.pptx
DNN.pptxDNN.pptx
DNN.pptx
 
Neural network
Neural networkNeural network
Neural network
 
Neural network 20161210_jintaekseo
Neural network 20161210_jintaekseoNeural network 20161210_jintaekseo
Neural network 20161210_jintaekseo
 
Artificial Neural Network
Artificial Neural NetworkArtificial Neural Network
Artificial Neural Network
 
Machine Learning.pdf
Machine Learning.pdfMachine Learning.pdf
Machine Learning.pdf
 
Perceptron (neural network)
Perceptron (neural network)Perceptron (neural network)
Perceptron (neural network)
 
19_Learning.ppt
19_Learning.ppt19_Learning.ppt
19_Learning.ppt
 
2-Perceptrons.pdf
2-Perceptrons.pdf2-Perceptrons.pdf
2-Perceptrons.pdf
 
Perceptron
PerceptronPerceptron
Perceptron
 
Perceptron Study Material with XOR example
Perceptron Study Material with XOR examplePerceptron Study Material with XOR example
Perceptron Study Material with XOR example
 
ANN.pptx
ANN.pptxANN.pptx
ANN.pptx
 
lecture13-NN-basics.pptx
lecture13-NN-basics.pptxlecture13-NN-basics.pptx
lecture13-NN-basics.pptx
 
10-Perceptron.pdf
10-Perceptron.pdf10-Perceptron.pdf
10-Perceptron.pdf
 
lecture07.ppt
lecture07.pptlecture07.ppt
lecture07.ppt
 
SOFT COMPUTERING TECHNICS -Unit 1
SOFT COMPUTERING TECHNICS -Unit 1SOFT COMPUTERING TECHNICS -Unit 1
SOFT COMPUTERING TECHNICS -Unit 1
 
soft computing
soft computingsoft computing
soft computing
 
Deep learning: Mathematical Perspective
Deep learning: Mathematical PerspectiveDeep learning: Mathematical Perspective
Deep learning: Mathematical Perspective
 

Recently uploaded

代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改atducpo
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSAishani27
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Data Warehouse , Data Cube Computation
Data Warehouse   , Data Cube ComputationData Warehouse   , Data Cube Computation
Data Warehouse , Data Cube Computationsit20ad004
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiSuhani Kapoor
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
Data Science Project: Advancements in Fetal Health Classification
Data Science Project: Advancements in Fetal Health ClassificationData Science Project: Advancements in Fetal Health Classification
Data Science Project: Advancements in Fetal Health ClassificationBoston Institute of Analytics
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 

Recently uploaded (20)

代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICS
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Data Warehouse , Data Cube Computation
Data Warehouse   , Data Cube ComputationData Warehouse   , Data Cube Computation
Data Warehouse , Data Cube Computation
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
Data Science Project: Advancements in Fetal Health Classification
Data Science Project: Advancements in Fetal Health ClassificationData Science Project: Advancements in Fetal Health Classification
Data Science Project: Advancements in Fetal Health Classification
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 

Perceptron 2015.ppt

  • 1. Carla P. Gomes CS4700 Basic Concepts Neural Network Input 0 Input 1 Input n ... Output 0 Output 1 Output m ... A Neural Network maps a set of inputs to a set of outputs Number of inputs/outputs is variable The Network itself is composed of an arbitrary number of nodes or units, connected by links, with an arbitrary topology. A link from unit i to unit j serves to propagate the activation aj to j, and it has a weight Wij. What can a neural networks do? Compute a known function / Approximate an unknown function Pattern Recognition / Signal Processing Learn to do any of the above
  • 3. Carla P. Gomes CS4700 Output edges, each with weights (positive, negative, and change over time, learning) Input edges, each with weights (positive, negative, and change over time, learning) An Artificial Neuron Node or Unit: A Mathematical Abstraction Artificial Neuron, Node or unit , Processing Unit i j n j i j i a W in    0 , Input function(ini): weighted sum of its inputs, including fixed input a0.  a processing element producing an output based on a function of its inputs Activation function (g) applied to input function (typically non-linear). Output ) ( 0 , j n j i j i a W g a    Note: the fixed input and bias weight are conventional; some authors instead, e.g., or a0=1 and -W0i
  • 4. Carla P. Gomes CS4700 (a) Threshold activation function  a step function or threshold function (outputs 1 when the input is positive; 0 otherwise). (b) Sigmoid (or logistics function) activation function (key advantage: differentiable) (c) Sign function, +1 if input is positive, otherwise -1. Activation Functions  Changing the bias weight W0,i moves the threshold location. These functions have a threshold (either hard or soft) at zero.
  • 5. Carla P. Gomes CS4700 Threshold Activation Function Input edges, each with weights (positive, negative, and change over time, learning) ; 0 0 ,     j n j i j i a W in i=0 i=t i i i j n j i j w w a W get we a defining , 0 , 0 1 , 0 , 1        ; 0 0 , 0 1 ,       a w a w in i j n j i j i i threshold value associated with unit i i i i j n j i j w w a W get we a defining , 0 , 0 1 , 0 , 1        
  • 6. Carla P. Gomes CS4700 i n j j i j W a W , 0 1 ,    Implementing Boolean Functions Units with a threshold activation function can act as logic gates; we can use these units to compute Boolean function of its inputs. Activation of threshold units when:
  • 7. Carla P. Gomes CS4700 Boolean AND input x1 input x2 ouput 0 0 0 0 1 0 1 0 0 1 1 1 x2 x1 w2=1 w1=1 W0= 1.5 -1 i n j j i j W a W , 0 1 ,    Activation of threshold units when:
  • 8. Carla P. Gomes CS4700 Boolean OR input x1 input x2 ouput 0 0 0 0 1 1 1 0 1 1 1 1 x2 x1 w2=1 w1=1 w0= 0.5 -1 i n j j i j W a W , 0 1 ,    Activation of threshold units when:
  • 9. Carla P. Gomes CS4700 Inverter input x1 output 0 1 1 0 x1 w1= 1 -1 w0= -0.5 i n j j i j W a W , 0 1 ,    Activation of threshold units when: So, units with a threshold activation function can act as logic gates given the appropriate input and bias weights.
  • 10. Carla P. Gomes CS4700 Network Structures Acyclic or Feed-forward networks Activation flows from input layer to output layer – single-layer perceptrons – multi-layer perceptrons Recurrent networks – Feed the outputs back into own inputs Network is a dynamical system (stable state, oscillations, chaotic behavior) Response of the network depends on initial state – Can support short-term memory – More difficult to understand Feed-forward networks implement functions, have no internal state (only weights). Our focus
  • 11. Carla P. Gomes CS4700 Feed-forward Network: Represents a function of Its Input Two hidden units Two input units One Output By adjusting the weights we get different functions: that is how learning is done in neural networks! Each unit receives input only from units in the immediately preceding layer. Given an input vector x = (x1,x2), the activations of the input units are set to values of the input vector, i.e., (a1,a2)=(x1,x2), and the network computes: Feed-forward network computes a parameterized family of functions hW(x) (Bias unit omitted for simplicity) Note: the input layer in general does not include computing units. Weights are the parameters of the function
  • 12. Carla P. Gomes CS4700 Perceptron Perceptron – Invented by Frank Rosenblatt in 1957 in an attempt to understand human memory, learning, and cognitive processes. – The first neural network model by computation, with a remarkable learning algorithm: • If function can be represented by perceptron, the learning algorithm is guaranteed to quickly converge to the hidden function! – Became the foundation of pattern recognition research Rosenblatt & Mark I Perceptron: the first machine that could "learn" to recognize and identify optical patterns. Cornell Aeronautical Laboratory One of the earliest and most influential neural networks: An important milestone in AI.
  • 13. Carla P. Gomes CS4700 Perceptron ROSENBLATT, Frank. (Cornell Aeronautical Laboratory at Cornell University ) The Perceptron: A Probabilistic Model for Information Storage and Organization in the Brain. In, Psychological Review, Vol. 65, No. 6, pp. 386- 408, November, 1958.
  • 14. Carla P. Gomes CS4700 Single Layer Feed-forward Neural Networks Perceptrons Single-layer neural network (perceptron network) A network with all the inputs connected directly to the outputs Since each output unit is independent of the others, we can limit our study to single output perceptrons. –Output units all operate separately: no shared weights
  • 15. Carla P. Gomes CS4700 Perceptron to Learn to Identify Digits (From Pat. Winston, MIT) Seven line segments are enough to produce all 10 digits 5 3 1 4 6 0 2 Digit x0 x1 x2 x3 x4 x5 x6 0 0 1 1 1 1 1 1 9 1 1 1 1 1 1 0 8 1 1 1 1 1 1 1 7 0 0 1 1 1 0 0 6 1 1 1 0 1 1 1 5 1 1 1 0 1 1 0 4 1 1 0 1 1 0 0 3 1 0 1 1 1 1 0 2 1 0 1 1 0 1 1 1 0 0 0 1 1 0 0
  • 16. Carla P. Gomes CS4700 Perceptron to Learn to Identify Digits (From Pat. Winston, MIT) Seven line segments are enough to produce all 10 digits A vision system reports which of the seven segments in the display are on, therefore producing the inputs for the perceptron. 5 3 1 4 6 0 2
  • 17. Carla P. Gomes CS4700 Perceptron to Learn to Identify Digit 0 Seven line segments are enough to produce all 10 digits A vision system reports which of the seven segments in the display are on, therefore producing the inputs for the perceptron. 5 3 1 4 6 0 2 0 0 0 0 0 -1 1 0 Digit x0 x1 x2 x3 x4 x5 x6 X7 (fixed input) 0 0 1 1 1 1 1 1 1 Sum>0  output=1 Else output=0 When the input digit is 0, what’s the value of sum?
  • 18. Carla P. Gomes CS4700 Single Layer Feed-forward Neural Networks Perceptrons Two input perceptron unit with a sigmoid (logistics) activation function: adjusting weights moves the location, orientation, and steepness of cliff
  • 19. Carla P. Gomes CS4700 Perceptron Learning: Intuition Weight Update  Input Ij (j=1,2,…,n)  Single output O: target output, T. Consider some initial weights Define example error: Err = T – O Now just move weights in right direction! If the error is positive, then we need to increase O. Err >0  need to increase O; Err <0  need to decrease O; Each input unit j, contributes Wj Ij to total input: if Ij is positive, increasing Wj tends to increase O; if Ij is negative, decreasing Wj tends to increase O; So, use: Wj  Wj +   Ij  Err Perceptron Learning Rule (Rosenblatt 1960)  is the learning rate.
  • 20. Carla P. Gomes CS4700 Let’s consider an example (adapted from Patrick Wintson book, MIT) Framework and notation: 0/1 signals Input vector: Weight vector: x0 = 1 and 0=-w0, simulate the threshold. O is output (0 or 1) (single output). Threshold function: Perceptron Learning: Simple Example    n x x x x X , , , 2 1 0     n w w w w W , , , 2 1 0  0 1 0 0        O else O then S x w S n k k k k Learning rate = 1.
  • 21. Carla P. Gomes CS4700 Set of examples, each example is a pair i.e., an input vector and a label y (0 or 1). Learning procedure, called the “error correcting method” • Start with all zero weight vector. • Cycle (repeatedly) through examples and for each example do: – If perceptron is 0 while it should be 1, add the input vector to the weight vector – If perceptron is 1 while it should be 0, subtract the input vector to the weight vector – Otherwise do nothing. Perceptron Learning: Simple Example Wj  Wj +   Ij  Err Err = T – O ) , ( i i y x  Intuitively correct, (e.g., if output is 0 but it should be 1, the weights are increased) ! This procedure provably converges (polynomial number of steps) if the function is represented by a perceptron (i.e., linearly separable)
  • 22. Carla P. Gomes CS4700 Perceptron Learning: Simple Example Consider learning the logical OR function. Our examples are: Sample x0 x1 x2 label 1 1 0 0 0 2 1 0 1 1 3 1 1 0 1 4 1 1 1 1 0 1 0 0        O else O then S x w S n k k k k Activation Function
  • 23. Carla P. Gomes CS4700 Perceptron Learning: Simple Example We’ll use a single perceptron with three inputs. We’ll start with all weights 0 W= <0,0,0> Example 1 I= < 1 0 0> label=0 W= <0,0,0> Perceptron (10+ 00+ 00 =0, S=0) output  0 it classifies it as 0, so correct, do nothing Example 2 I=<1 0 1> label=1 W= <0,0,0> Perceptron (10+ 00+ 10 = 0) output 0 it classifies it as 0, while it should be 1, so we add input to weights W = <0,0,0> + <1,0,1>= <1,0,1> I0 I1 I2 O w0 w1 w2 0 1 0 0        O else O then S x w S n k k k k 1 If perceptron is 0 while it should be 1, add the input vector to the weight vector If perceptron is 1 while it should be 0, subtract the input vector to the weight vector Otherwise do nothing. Error correcting method
  • 24. Carla P. Gomes CS4700 Example 3 I=<1 1 0> label=1 W= <1,0,1> Perceptron (10+ 10+ 00 > 0) output = 1 it classifies it as 1, correct, do nothing W = <1,0,1> Example 4 I=<1 1 1> label=1 W= <1,0,1> Perceptron (10+ 10+ 10 > 0) output = 1 it classifies it as 1, correct, do nothing W = <1,0,1> I0 I1 I2 O w0 w1 w2 1
  • 25. Carla P. Gomes CS4700 Perceptron Learning: Simple Example Epoch 2, through the examples, W = <1,0,1> . Example 1 I = <1,0,0> label=0 W = <1,0,1> Perceptron (11+ 00+ 01 >0) output  1 it classifies it as 1, while it should be 0, so subtract input from weights W = <1,0,1> - <1,0,0> = <0, 0, 1> Example 2 I=<1 0 1> label=1 W= <0,0,1> Perceptron (10+ 00+ 11 > 0) output 1 it classifies it as 1, so correct, do nothing I0 I1 I2 O w0 w1 w2 If perceptron is 0 while it should be 1, add the input vector to the weight vector If perceptron is 1 while it should be 0, subtract the input vector from the weight vector Otherwise do nothing. Error correcting method 1
  • 26. Carla P. Gomes CS4700 Example 3 I=<1 1 0> label=1 W= <0,0,1> Perceptron (10+ 10+ 01 > 0) output = 0 it classifies it as 0, while it should be 1, so add input to weights W = <0,0,1> + W = <1,1,0> = <1, 1, 1> Example 4 I=<1 1 1> label=1 W= <1,1,1> Perceptron (11+ 11+ 11 > 0) output = 1 it classifies it as 1, correct, do nothing W = <1,1,1>
  • 27. Carla P. Gomes CS4700 Perceptron Learning: Simple Example Epoch 3, through the examples, W = <1,1,1> . Example 1 I=<1,0,0> label=0 W = <1,1,1> Perceptron (11+ 01+ 01 >0) output  1 it classifies it as 1, while it should be 0, so subtract input from weights W = <1,1,1> - W = <1,0,0> = <0, 1, 1> Example 2 I=<1 0 1> label=1 W= <0, 1, 1> Perceptron (10+ 01+ 11 > 0) output 1 it classifies it as 1, so correct, do nothing I0 I1 I2 O w0 w1 w2 1
  • 28. Carla P. Gomes CS4700 Example 3 I=<1 1 0> label=1 W= <0, 1, 1> Perceptron (10+ 11+ 01 > 0) output = 1 it classifies it as 1, correct, do nothing Example 4 I=<1 1 1> label=1 W= <0, 1, 1> Perceptron (10+ 11+ 11 > 0) output = 1 it classifies it as 1, correct, do nothing W = <1,1,1>
  • 29. Carla P. Gomes CS4700 Perceptron Learning: Simple Example Epoch 4, through the examples, W= <0, 1, 1>. Example 1 I= <1,0,0> label=0 W = <0,1,1> Perceptron (10+ 01+ 01 = 0) output  0 it classifies it as 0, so correct, do nothing I0 I1 I2 O W0 =0 W1=1 W2=1 OR So the final weight vector W= <0, 1, 1> classifies all examples correctly, and the perceptron has learned the function! Aside: in more realistic cases the bias (W0) will not be 0. (This was just a toy example!) Also, in general, many more inputs (100 to 1000) 1
  • 30. Carla P. Gomes CS4700 Epoch x0 x1 x2 Desired Target w0 w1 w2 Output Error New w0 New w1 New w2 1 example 1 1 0 0 0 0 0 0 0 0 0 0 0 1 0 1 1 0 0 0 0 1 1 0 1 1 1 0 1 1 0 1 1 0 1 0 1 1 1 1 1 1 0 1 1 0 1 0 1 2 1 0 0 0 1 0 1 1 -1 0 0 1 1 0 1 1 0 0 1 1 0 0 0 1 1 1 0 1 0 0 1 0 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 3 1 0 0 0 1 1 1 1 -1 0 1 1 1 0 1 1 0 1 1 1 0 0 1 1 1 1 0 1 0 1 1 1 0 0 1 1 1 1 1 1 0 1 1 1 0 0 1 1 4 1 0 0 0 0 1 1 0 0 0 1 1
  • 31. Carla P. Gomes CS4700 Epoch x0 x1 x2 Desired Target w0 w1 w2 Output Error New w0 New w1 New w2 1 example 1 1 0 0 0 0 0 0 0 0 0 0 0 example 2 1 0 1 1 0 0 0 0 1 1 0 1 1 1 0 1 1 0 1 1 0 1 0 1 1 1 1 1 1 0 1 1 0 1 0 1 2 1 0 0 0 1 0 1 1 -1 0 0 1 1 0 1 1 0 0 1 1 0 0 0 1 1 1 0 1 0 0 1 0 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 3 1 0 0 0 1 1 1 1 -1 0 1 1 1 0 1 1 0 1 1 1 0 0 1 1 1 1 0 1 0 1 1 1 0 0 1 1 1 1 1 1 0 1 1 1 0 0 1 1 4 1 0 0 0 0 1 1 0 0 0 1 1
  • 32. Carla P. Gomes CS4700 Epoch x0 x1 x2 Desired Target w0 w1 w2 Output Error New w0 New w1 New w2 1 example 1 1 0 0 0 0 0 0 0 0 0 0 0 example 2 1 0 1 1 0 0 0 0 1 1 0 1 example 3 1 1 0 1 1 0 1 1 0 1 0 1 1 1 1 1 1 0 1 1 0 1 0 1 2 1 0 0 0 1 0 1 1 -1 0 0 1 1 0 1 1 0 0 1 1 0 0 0 1 1 1 0 1 0 0 1 0 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 3 1 0 0 0 1 1 1 1 -1 0 1 1 1 0 1 1 0 1 1 1 0 0 1 1 1 1 0 1 0 1 1 1 0 0 1 1 1 1 1 1 0 1 1 1 0 0 1 1 4 1 0 0 0 0 1 1 0 0 0 1 1
  • 33. Carla P. Gomes CS4700 Epoch x0 x1 x2 Desired Target w0 w1 w2 Output Error New w0 New w1 New w2 1 example 1 1 0 0 0 0 0 0 0 0 0 0 0 example 2 1 0 1 1 0 0 0 0 1 1 0 1 example 3 1 1 0 1 1 0 1 1 0 1 0 1 example 4 1 1 1 1 1 0 1 1 0 1 0 1 2 1 0 0 0 1 0 1 1 -1 0 0 1 1 0 1 1 0 0 1 1 0 0 0 1 1 1 0 1 0 0 1 0 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 3 1 0 0 0 1 1 1 1 -1 0 1 1 1 0 1 1 0 1 1 1 0 0 1 1 1 1 0 1 0 1 1 1 0 0 1 1 1 1 1 1 0 1 1 1 0 0 1 1 4 1 0 0 0 0 1 1 0 0 0 1 1
  • 34. Carla P. Gomes CS4700 Epoch x0 x1 x2 Desired Target w0 w1 w2 Output Error New w0 New w1 New w2 1 example 1 1 0 0 0 0 0 0 0 0 0 0 0 example 2 1 0 1 1 0 0 0 0 1 1 0 1 example 3 1 1 0 1 1 0 1 1 0 1 0 1 example 4 1 1 1 1 1 0 1 1 0 1 0 1 2 example 1 1 0 0 0 1 0 1 1 -1 0 0 1 1 0 1 1 0 0 1 1 0 0 0 1 1 1 0 1 0 0 1 0 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 3 1 0 0 0 1 1 1 1 -1 0 1 1 1 0 1 1 0 1 1 1 0 0 1 1 1 1 0 1 0 1 1 1 0 0 1 1 1 1 1 1 0 1 1 1 0 0 1 1 4 1 0 0 0 0 1 1 0 0 0 1 1
  • 35. Carla P. Gomes CS4700 Epoch x0 x1 x2 Desired Target w0 w1 w2 Output Error New w0 New w1 New w2 1 example 1 1 0 0 0 0 0 0 0 0 0 0 0 example 2 1 0 1 1 0 0 0 0 1 1 0 1 example 3 1 1 0 1 1 0 1 1 0 1 0 1 example 4 1 1 1 1 1 0 1 1 0 1 0 1 2 example 1 1 0 0 0 1 0 1 1 -1 0 0 1 example 2 1 0 1 1 0 0 1 1 0 0 0 1 1 1 0 1 0 0 1 0 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 3 1 0 0 0 1 1 1 1 -1 0 1 1 1 0 1 1 0 1 1 1 0 0 1 1 1 1 0 1 0 1 1 1 0 0 1 1 1 1 1 1 0 1 1 1 0 0 1 1 4 1 0 0 0 0 1 1 0 0 0 1 1
  • 36. Carla P. Gomes CS4700 Epoch x0 x1 x2 Desired Target w0 w1 w2 Output Error New w0 New w1 New w2 1 example 1 1 0 0 0 0 0 0 0 0 0 0 0 example 2 1 0 1 1 0 0 0 0 1 1 0 1 example 3 1 1 0 1 1 0 1 1 0 1 0 1 example 4 1 1 1 1 1 0 1 1 0 1 0 1 2 example 1 1 0 0 0 1 0 1 1 -1 0 0 1 example 2 1 0 1 1 0 0 1 1 0 0 0 1 example 3 1 1 0 1 0 0 1 0 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 3 1 0 0 0 1 1 1 1 -1 0 1 1 1 0 1 1 0 1 1 1 0 0 1 1 1 1 0 1 0 1 1 1 0 0 1 1 1 1 1 1 0 1 1 1 0 0 1 1 4 1 0 0 0 0 1 1 0 0 0 1 1
  • 37. Carla P. Gomes CS4700 Epoch x0 x1 x2 Desired Target w0 w1 w2 Output Error New w0 New w1 New w2 1 example 1 1 0 0 0 0 0 0 0 0 0 0 0 example 2 1 0 1 1 0 0 0 0 1 1 0 1 example 3 1 1 0 1 1 0 1 1 0 1 0 1 example 4 1 1 1 1 1 0 1 1 0 1 0 1 2 example 1 1 0 0 0 1 0 1 1 -1 0 0 1 example 2 1 0 1 1 0 0 1 1 0 0 0 1 example 3 1 1 0 1 0 0 1 0 1 1 1 1 example 4 1 1 1 1 1 1 1 1 0 1 1 1 3 1 0 0 0 1 1 1 1 -1 0 1 1 1 0 1 1 0 1 1 1 0 0 1 1 1 1 0 1 0 1 1 1 0 0 1 1 1 1 1 1 0 1 1 1 0 0 1 1 4 1 0 0 0 0 1 1 0 0 0 1 1
  • 38. Carla P. Gomes CS4700 Epoch x0 x1 x2 Desired Target w0 w1 w2 Output Error New w0 New w1 New w2 1 example 1 1 0 0 0 0 0 0 0 0 0 0 0 example 2 1 0 1 1 0 0 0 0 1 1 0 1 example 3 1 1 0 1 1 0 1 1 0 1 0 1 example 4 1 1 1 1 1 0 1 1 0 1 0 1 2 example 1 1 0 0 0 1 0 1 1 -1 0 0 1 example 2 1 0 1 1 0 0 1 1 0 0 0 1 example 3 1 1 0 1 0 0 1 0 1 1 1 1 example 4 1 1 1 1 1 1 1 1 0 1 1 1 3 example 1 1 0 0 0 1 1 1 1 -1 0 1 1 1 0 1 1 0 1 1 1 0 0 1 1 1 1 0 1 0 1 1 1 0 0 1 1 1 1 1 1 0 1 1 1 0 0 1 1 4 1 0 0 0 0 1 1 0 0 0 1 1
  • 39. Carla P. Gomes CS4700 Epoch x0 x1 x2 Desired Target w0 w1 w2 Output Error New w0 New w1 New w2 1 example 1 1 0 0 0 0 0 0 0 0 0 0 0 example 2 1 0 1 1 0 0 0 0 1 1 0 1 example 3 1 1 0 1 1 0 1 1 0 1 0 1 example 4 1 1 1 1 1 0 1 1 0 1 0 1 2 example 1 1 0 0 0 1 0 1 1 -1 0 0 1 example 2 1 0 1 1 0 0 1 1 0 0 0 1 example 3 1 1 0 1 0 0 1 0 1 1 1 1 example 4 1 1 1 1 1 1 1 1 0 1 1 1 3 example 1 1 0 0 0 1 1 1 1 -1 0 1 1 example 2 1 0 1 1 0 1 1 1 0 0 1 1 1 1 0 1 0 1 1 1 0 0 1 1 1 1 1 1 0 1 1 1 0 0 1 1 4 1 0 0 0 0 1 1 0 0 0 1 1
  • 40. Carla P. Gomes CS4700 Epoch x0 x1 x2 Desired Target w0 w1 w2 Output Error New w0 New w1 New w2 1 example 1 1 0 0 0 0 0 0 0 0 0 0 0 example 2 1 0 1 1 0 0 0 0 1 1 0 1 example 3 1 1 0 1 1 0 1 1 0 1 0 1 example 4 1 1 1 1 1 0 1 1 0 1 0 1 2 example 1 1 0 0 0 1 0 1 1 -1 0 0 1 example 2 1 0 1 1 0 0 1 1 0 0 0 1 example 3 1 1 0 1 0 0 1 0 1 1 1 1 example 4 1 1 1 1 1 1 1 1 0 1 1 1 3 example 1 1 0 0 0 1 1 1 1 -1 0 1 1 example 2 1 0 1 1 0 1 1 1 0 0 1 1 example 3 1 1 0 1 0 1 1 1 0 0 1 1 1 1 1 1 0 1 1 1 0 0 1 1 4 1 0 0 0 0 1 1 0 0 0 1 1
  • 41. Carla P. Gomes CS4700 Epoch x0 x1 x2 Desired Target w0 w1 w2 Output Error New w0 New w1 New w2 1 example 1 1 0 0 0 0 0 0 0 0 0 0 0 example 2 1 0 1 1 0 0 0 0 1 1 0 1 example 3 1 1 0 1 1 0 1 1 0 1 0 1 example 4 1 1 1 1 1 0 1 1 0 1 0 1 2 example 1 1 0 0 0 1 0 1 1 -1 0 0 1 example 2 1 0 1 1 0 0 1 1 0 0 0 1 example 3 1 1 0 1 0 0 1 0 1 1 1 1 example 4 1 1 1 1 1 1 1 1 0 1 1 1 3 example 1 1 0 0 0 1 1 1 1 -1 0 1 1 example 2 1 0 1 1 0 1 1 1 0 0 1 1 example 3 1 1 0 1 0 1 1 1 0 0 1 1 example 4 1 1 1 1 0 1 1 1 0 0 1 1 4 1 0 0 0 0 1 1 0 0 0 1 1
  • 42. Carla P. Gomes CS4700 Epoch x0 x1 x2 Desired Target w0 w1 w2 Output Error New w0 New w1 New w2 1 example 1 1 0 0 0 0 0 0 0 0 0 0 0 example 2 1 0 1 1 0 0 0 0 1 1 0 1 example 3 1 1 0 1 1 0 1 1 0 1 0 1 example 4 1 1 1 1 1 0 1 1 0 1 0 1 2 example 1 1 0 0 0 1 0 1 1 -1 0 0 1 example 2 1 0 1 1 0 0 1 1 0 0 0 1 example 3 1 1 0 1 0 0 1 0 1 1 1 1 example 4 1 1 1 1 1 1 1 1 0 1 1 1 3 example 1 1 0 0 0 1 1 1 1 -1 0 1 1 example 2 1 0 1 1 0 1 1 1 0 0 1 1 example 3 1 1 0 1 0 1 1 1 0 0 1 1 example 4 1 1 1 1 0 1 1 1 0 0 1 1 4 example 1 1 0 0 0 0 1 1 0 0 0 1 1
  • 43. Carla P. Gomes CS4700 Threshold perceptrons have some advantages , in particular  Simple learning algorithm that fits a threshold perceptron to any linearly separable training set. Key idea: Learn by adjusting weights to reduce error on training set.  update weights repeatedly (epochs) for each example. We’ll use: Sum of squared errors (e.g., used in linear regression), classical error measure Learning is an optimization search problem in weight space. Derivation of a learning rule for Perceptrons Minimizing Squared Errors
  • 44. Carla P. Gomes CS4700 Derivation of a learning rule for Perceptrons Minimizing Squared Errors Let S = {(xi, yi): i = 1, 2, ..., m} be a training set. (Note, x is a vector of inputs, and y is the vector of the true outputs.) Let hw be the perceptron classifier represented by the weight vector w. Definition: 2 )) ( ( 2 1 ) ( ) ( x x x w h y Error Squared E   
  • 45. Carla P. Gomes CS4700 The squared error for a single training example with input x and true output y is: Where hw (x) is the output of the perceptron on the example and y is the true output value. We can use the gradient descent to reduce the squared error by calculating the partial derivatives of E with respect to each weight. Note: g’(in) derivative of the activation function. For sigmoid g’=g(1-g). For threshold perceptrons, Where g’(n) is undefined, the original perceptron rule simply omitted it. Derivation of a learning rule for Perceptrons Minimizing Squared Errors
  • 46. Carla P. Gomes CS4700 Gradient descent algorithm  we want to reduce , E, for each weight wi , change weight in direction of steepest descent: Intuitively: Err = y – hW(x) positive output is too small  weights are increased for positive inputs and decreased for negative inputs. Err = y – hW(x) negative  opposite  learning rate Wj  Wj +   Ij  Err
  • 47. Carla P. Gomes CS4700 Rule is intuitively correct! Greedy Search: Gradient descent through weight space! Surprising proof of convergence: Weight space has no local minima! With enough examples, it will find the target function! (provide  not too large) Perceptron Learning: Intuition
  • 48. Carla P. Gomes CS4700 Gradient descent in weight space From T. M. Mitchell, Machine Learning
  • 49. Carla P. Gomes CS4700 Perceptron learning rule: 1. Start with random weights, w = (w1, w2, ... , wn). 2. Select a training example (x,y)  S. 3. Run the perceptron with input x and weights w to obtain g 4. Let  be the training rate (a user-set parameter). 5. Go to 2. i i i i i i x in g in g y w w w w w ) ( ' )) ( ( where , ,         Epoch  cycle through the examples The stochastic gradient method selects examples randomly from the training set rather than cycling through them. Epochs are repeated until some stopping criterion is reached— typically, that the weight changes have become very small. Wj  Wj +   Ij  Err
  • 50. Carla P. Gomes CS4700 Perceptron Learning: Gradient Descent Learning Algorithm
  • 52. Carla P. Gomes CS4700 Expressiveness of Perceptrons What hypothesis space can a perceptron represent? Even more complex Booelan functions such as majority function . But can it represent any arbitrary Boolean function?
  • 53. Carla P. Gomes CS4700 Expressiveness of Perceptrons A threshold perceptron returns 1 iff the weighted sum of its inputs (including the bias) is positive, i.e.,: I.e., iff the input is on one side of the hyperplane it defines. Linear discriminant function or linear decision surface. Weights determine slope and bias determines offset. Perceptron  Linear Separator
  • 54. Carla P. Gomes CS4700 x1 x2 + + + + + + + 2 0 1 2 1 2 2 2 1 1 0 0 w w x w w x x w x w w        Can view trained network as defining a “separation line”. Linear Separability Percepton used for classification Consider example with two inputs, x1, x2: What is its equation?
  • 55. Carla P. Gomes CS4700 Linear Separability x1 x2     OR
  • 56. Carla P. Gomes CS4700 Linear Separability x1 x2     AND
  • 57. Carla P. Gomes CS4700 Linear Separability x1 x2     XOR
  • 58. Carla P. Gomes CS4700 Linear Separability x1 x2     XOR Minsky & Papert (1969) Bad News: Perceptrons can only represent linearly separable functions. Not linearly separable
  • 59. Carla P. Gomes CS4700 Consider a threshold perceptron for the logical XOR function (two inputs): Our examples are: x1 x2 label 1 0 0 0 2 1 0 1 3 0 1 1 4 1 1 0 Linear Separability: XOR T x w x w   2 2 1 1 Given our examples, we have the following inequalities for the perceptron: From (1) 0 + 0 ≤ T  T0 From (2) w1+ 0 > T  w1 > T From (3) 0 + w2 > T  w2 > T From (4) w1 + w2 ≤ T w1 + w2 > 2T contradiction So, XOR is not linearly separable
  • 60. Carla P. Gomes CS4700 Convergence of Perceptron Learning Algorithm … training data linearly separable … step size  sufficiently small … no “hidden” units Perceptron converges to a consistent function, if…
  • 61. Carla P. Gomes CS4700 Perceptron learns majority function easily, DTL is hopeless
  • 62. Carla P. Gomes CS4700 DTL learns restaurant function easily, perceptron cannot represent it
  • 63. Carla P. Gomes CS4700 Good news: Adding hidden layer allows more target functions to be represented. Minsky & Papert (1969)