1/1
CS7015 (Deep Learning) : Lecture 2
McCulloch Pitts Neuron, Thresholding Logic, Perceptrons, Perceptron
Learning Algorithm and Convergence, Multilayer Perceptrons (MLPs),
Representation Power of MLPs
Mitesh M. Khapra
Department of Computer Science and Engineering
Indian Institute of Technology Madras
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
2/1
Module 2.1: Biological Neurons
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
3/1
x1 x2 x3
σ
y
w1 w2 w3
Artificial Neuron
The most fundamental unit of a deep
neural network is called an artificial
neuron
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
3/1
x1 x2 x3
σ
y
w1 w2 w3
Artificial Neuron
The most fundamental unit of a deep
neural network is called an artificial
neuron
Why is it called a neuron ? Where
does the inspiration come from ?
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
3/1
x1 x2 x3
σ
y
w1 w2 w3
Artificial Neuron
The most fundamental unit of a deep
neural network is called an artificial
neuron
Why is it called a neuron ? Where
does the inspiration come from ?
The inspiration comes from biology
(more specifically, from the brain)
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
3/1
x1 x2 x3
σ
y
w1 w2 w3
Artificial Neuron
The most fundamental unit of a deep
neural network is called an artificial
neuron
Why is it called a neuron ? Where
does the inspiration come from ?
The inspiration comes from biology
(more specifically, from the brain)
biological neurons = neural cells =
neural processing units
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
3/1
x1 x2 x3
σ
y
w1 w2 w3
Artificial Neuron
The most fundamental unit of a deep
neural network is called an artificial
neuron
Why is it called a neuron ? Where
does the inspiration come from ?
The inspiration comes from biology
(more specifically, from the brain)
biological neurons = neural cells =
neural processing units
We will first see what a biological
neuron looks like ...
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
4/1
Biological Neurons∗
∗
Image adapted from
https://cdn.vectorstock.com/i/composite/12,25/neuron-cell-vector-81225.jpg
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
4/1
Biological Neurons∗
∗
Image adapted from
https://cdn.vectorstock.com/i/composite/12,25/neuron-cell-vector-81225.jpg
dendrite: receives signals from other
neurons
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
4/1
Biological Neurons∗
∗
Image adapted from
https://cdn.vectorstock.com/i/composite/12,25/neuron-cell-vector-81225.jpg
dendrite: receives signals from other
neurons
synapse: point of connection to
other neurons
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
4/1
Biological Neurons∗
∗
Image adapted from
https://cdn.vectorstock.com/i/composite/12,25/neuron-cell-vector-81225.jpg
dendrite: receives signals from other
neurons
synapse: point of connection to
other neurons
soma: processes the information
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
4/1
Biological Neurons∗
∗
Image adapted from
https://cdn.vectorstock.com/i/composite/12,25/neuron-cell-vector-81225.jpg
dendrite: receives signals from other
neurons
synapse: point of connection to
other neurons
soma: processes the information
axon: transmits the output of this
neuron
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
5/1
Let us see a very cartoonish illustra-
tion of how a neuron works
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
5/1
Let us see a very cartoonish illustra-
tion of how a neuron works
Our sense organs interact with the
outside world
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
5/1
Let us see a very cartoonish illustra-
tion of how a neuron works
Our sense organs interact with the
outside world
They relay information to the neur-
ons
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
5/1
Let us see a very cartoonish illustra-
tion of how a neuron works
Our sense organs interact with the
outside world
They relay information to the neur-
ons
The neurons (may) get activated and
produces a response (laughter in this
case)
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
6/1
Of course, in reality, it is not just a single
neuron which does all this
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
6/1
Of course, in reality, it is not just a single
neuron which does all this
There is a massively parallel interconnected
network of neurons
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
6/1
Of course, in reality, it is not just a single
neuron which does all this
There is a massively parallel interconnected
network of neurons
The sense organs relay information to the low-
est layer of neurons
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
6/1
Of course, in reality, it is not just a single
neuron which does all this
There is a massively parallel interconnected
network of neurons
The sense organs relay information to the low-
est layer of neurons
Some of these neurons may fire (in red) in re-
sponse to this information and in turn relay
information to other neurons they are connec-
ted to
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
6/1
Of course, in reality, it is not just a single
neuron which does all this
There is a massively parallel interconnected
network of neurons
The sense organs relay information to the low-
est layer of neurons
Some of these neurons may fire (in red) in re-
sponse to this information and in turn relay
information to other neurons they are connec-
ted to
These neurons may also fire (again, in red)
and the process continues
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
6/1
Of course, in reality, it is not just a single
neuron which does all this
There is a massively parallel interconnected
network of neurons
The sense organs relay information to the low-
est layer of neurons
Some of these neurons may fire (in red) in re-
sponse to this information and in turn relay
information to other neurons they are connec-
ted to
These neurons may also fire (again, in red)
and the process continues eventually resulting
in a response (laughter in this case)
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
6/1
Of course, in reality, it is not just a single
neuron which does all this
There is a massively parallel interconnected
network of neurons
The sense organs relay information to the low-
est layer of neurons
Some of these neurons may fire (in red) in re-
sponse to this information and in turn relay
information to other neurons they are connec-
ted to
These neurons may also fire (again, in red)
and the process continues eventually resulting
in a response (laughter in this case)
An average human brain has around 1011 (100
billion) neurons!
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
7/1
This massively parallel network also ensures
that there is division of work
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
7/1
This massively parallel network also ensures
that there is division of work
Each neuron may perform a certain role or
respond to a certain stimulus
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
7/1
A simplified illustration
This massively parallel network also ensures
that there is division of work
Each neuron may perform a certain role or
respond to a certain stimulus
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
7/1
A simplified illustration
This massively parallel network also ensures
that there is division of work
Each neuron may perform a certain role or
respond to a certain stimulus
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
7/1
A simplified illustration
This massively parallel network also ensures
that there is division of work
Each neuron may perform a certain role or
respond to a certain stimulus
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
7/1
A simplified illustration
This massively parallel network also ensures
that there is division of work
Each neuron may perform a certain role or
respond to a certain stimulus
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
7/1
A simplified illustration
This massively parallel network also ensures
that there is division of work
Each neuron may perform a certain role or
respond to a certain stimulus
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
8/1
The neurons in the brain are arranged
in a hierarchy
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
8/1
The neurons in the brain are arranged
in a hierarchy
We illustrate this with the help of
visual cortex (part of the brain) which
deals with processing visual informa-
tion
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
8/1
The neurons in the brain are arranged
in a hierarchy
We illustrate this with the help of
visual cortex (part of the brain) which
deals with processing visual informa-
tion
Starting from the retina, the informa-
tion is relayed to several layers (follow
the arrows)
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
8/1
The neurons in the brain are arranged
in a hierarchy
We illustrate this with the help of
visual cortex (part of the brain) which
deals with processing visual informa-
tion
Starting from the retina, the informa-
tion is relayed to several layers (follow
the arrows)
We observe that the layers V 1, V 2 to
AIT form a hierarchy (from identify-
ing simple visual forms to high level
objects)
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
9/1
Sample illustration of hierarchical
processing∗
∗
Idea borrowed from Hugo Larochelle’s lecture slides
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
9/1
Sample illustration of hierarchical
processing∗
∗
Idea borrowed from Hugo Larochelle’s lecture slides
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
9/1
Sample illustration of hierarchical
processing∗
∗
Idea borrowed from Hugo Larochelle’s lecture slides
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
10/1
Disclaimer
I understand very little about how the brain works!
What you saw so far is an overly simplified explanation of how the brain works!
But this explanation suffices for the purpose of this course!
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
11/1
Module 2.2: McCulloch Pitts Neuron
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
12/1
McCulloch (neuroscientist) and Pitts (logi-
cian) proposed a highly simplified computa-
tional model of the neuron (1943)
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
12/1
x1
McCulloch (neuroscientist) and Pitts (logi-
cian) proposed a highly simplified computa-
tional model of the neuron (1943)
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
12/1
x1 x2
McCulloch (neuroscientist) and Pitts (logi-
cian) proposed a highly simplified computa-
tional model of the neuron (1943)
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
12/1
x1 x2 .. ..
McCulloch (neuroscientist) and Pitts (logi-
cian) proposed a highly simplified computa-
tional model of the neuron (1943)
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
12/1
x1 x2 .. .. xn
McCulloch (neuroscientist) and Pitts (logi-
cian) proposed a highly simplified computa-
tional model of the neuron (1943)
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
12/1
x1 x2 .. .. xn ∈ {0, 1}
McCulloch (neuroscientist) and Pitts (logi-
cian) proposed a highly simplified computa-
tional model of the neuron (1943)
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
12/1
x1 x2 .. .. xn ∈ {0, 1}
g
McCulloch (neuroscientist) and Pitts (logi-
cian) proposed a highly simplified computa-
tional model of the neuron (1943)
g aggregates the inputs
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
12/1
x1 x2 .. .. xn ∈ {0, 1}
g
f
McCulloch (neuroscientist) and Pitts (logi-
cian) proposed a highly simplified computa-
tional model of the neuron (1943)
g aggregates the inputs and the function f
takes a decision based on this aggregation
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
12/1
x1 x2 .. .. xn ∈ {0, 1}
y ∈ {0, 1}
g
f
McCulloch (neuroscientist) and Pitts (logi-
cian) proposed a highly simplified computa-
tional model of the neuron (1943)
g aggregates the inputs and the function f
takes a decision based on this aggregation
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
12/1
x1 x2 .. .. xn ∈ {0, 1}
y ∈ {0, 1}
g
f
McCulloch (neuroscientist) and Pitts (logi-
cian) proposed a highly simplified computa-
tional model of the neuron (1943)
g aggregates the inputs and the function f
takes a decision based on this aggregation
The inputs can be excitatory or inhibitory
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
12/1
x1 x2 .. .. xn ∈ {0, 1}
y ∈ {0, 1}
g
f
McCulloch (neuroscientist) and Pitts (logi-
cian) proposed a highly simplified computa-
tional model of the neuron (1943)
g aggregates the inputs and the function f
takes a decision based on this aggregation
The inputs can be excitatory or inhibitory
y = 0 if any xi is inhibitory, else
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
12/1
x1 x2 .. .. xn ∈ {0, 1}
y ∈ {0, 1}
g
f
McCulloch (neuroscientist) and Pitts (logi-
cian) proposed a highly simplified computa-
tional model of the neuron (1943)
g aggregates the inputs and the function f
takes a decision based on this aggregation
The inputs can be excitatory or inhibitory
y = 0 if any xi is inhibitory, else
g(x1, x2, ..., xn) = g(x) =
n
X
i=1
xi
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
12/1
x1 x2 .. .. xn ∈ {0, 1}
y ∈ {0, 1}
g
f
McCulloch (neuroscientist) and Pitts (logi-
cian) proposed a highly simplified computa-
tional model of the neuron (1943)
g aggregates the inputs and the function f
takes a decision based on this aggregation
The inputs can be excitatory or inhibitory
y = 0 if any xi is inhibitory, else
g(x1, x2, ..., xn) = g(x) =
n
X
i=1
xi
y = f(g(x)) = 1 if g(x) ≥ θ
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
12/1
x1 x2 .. .. xn ∈ {0, 1}
y ∈ {0, 1}
g
f
McCulloch (neuroscientist) and Pitts (logi-
cian) proposed a highly simplified computa-
tional model of the neuron (1943)
g aggregates the inputs and the function f
takes a decision based on this aggregation
The inputs can be excitatory or inhibitory
y = 0 if any xi is inhibitory, else
g(x1, x2, ..., xn) = g(x) =
n
X
i=1
xi
y = f(g(x)) = 1 if g(x) ≥ θ
= 0 if g(x) < θ
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
12/1
x1 x2 .. .. xn ∈ {0, 1}
y ∈ {0, 1}
g
f
McCulloch (neuroscientist) and Pitts (logi-
cian) proposed a highly simplified computa-
tional model of the neuron (1943)
g aggregates the inputs and the function f
takes a decision based on this aggregation
The inputs can be excitatory or inhibitory
y = 0 if any xi is inhibitory, else
g(x1, x2, ..., xn) = g(x) =
n
X
i=1
xi
y = f(g(x)) = 1 if g(x) ≥ θ
= 0 if g(x) < θ
θ is called the thresholding parameter
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
12/1
x1 x2 .. .. xn ∈ {0, 1}
y ∈ {0, 1}
g
f
McCulloch (neuroscientist) and Pitts (logi-
cian) proposed a highly simplified computa-
tional model of the neuron (1943)
g aggregates the inputs and the function f
takes a decision based on this aggregation
The inputs can be excitatory or inhibitory
y = 0 if any xi is inhibitory, else
g(x1, x2, ..., xn) = g(x) =
n
X
i=1
xi
y = f(g(x)) = 1 if g(x) ≥ θ
= 0 if g(x) < θ
θ is called the thresholding parameter
This is called Thresholding Logic
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
13/1
Let us implement some boolean functions using this McCulloch Pitts (MP) neuron
...
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
14/1
x1 x2 x3
y ∈ {0, 1}
θ
A McCulloch Pitts unit
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
14/1
x1 x2 x3
y ∈ {0, 1}
θ
A McCulloch Pitts unit
x1 x2 x3
y ∈ {0, 1}
AND function
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
14/1
x1 x2 x3
y ∈ {0, 1}
θ
A McCulloch Pitts unit
x1 x2 x3
y ∈ {0, 1}
3
AND function
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
14/1
x1 x2 x3
y ∈ {0, 1}
θ
A McCulloch Pitts unit
x1 x2 x3
y ∈ {0, 1}
3
AND function
x1 x2 x3
y ∈ {0, 1}
OR function
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
14/1
x1 x2 x3
y ∈ {0, 1}
θ
A McCulloch Pitts unit
x1 x2 x3
y ∈ {0, 1}
3
AND function
x1 x2 x3
y ∈ {0, 1}
1
OR function
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
14/1
x1 x2 x3
y ∈ {0, 1}
θ
A McCulloch Pitts unit
x1 x2
y ∈ {0, 1}
x1 AND !x2
∗
∗
circle at the end indicates inhibitory input: if any inhibitory input is 1 the output will be 0
x1 x2 x3
y ∈ {0, 1}
3
AND function
x1 x2 x3
y ∈ {0, 1}
1
OR function
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
14/1
x1 x2 x3
y ∈ {0, 1}
θ
A McCulloch Pitts unit
x1 x2
y ∈ {0, 1}
1
x1 AND !x2
∗
∗
circle at the end indicates inhibitory input: if any inhibitory input is 1 the output will be 0
x1 x2 x3
y ∈ {0, 1}
3
AND function
x1 x2 x3
y ∈ {0, 1}
1
OR function
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
14/1
x1 x2 x3
y ∈ {0, 1}
θ
A McCulloch Pitts unit
x1 x2
y ∈ {0, 1}
1
x1 AND !x2
∗
∗
circle at the end indicates inhibitory input: if any inhibitory input is 1 the output will be 0
x1 x2 x3
y ∈ {0, 1}
3
AND function
x1 x2
y ∈ {0, 1}
NOR function
x1 x2 x3
y ∈ {0, 1}
1
OR function
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
14/1
x1 x2 x3
y ∈ {0, 1}
θ
A McCulloch Pitts unit
x1 x2
y ∈ {0, 1}
1
x1 AND !x2
∗
∗
circle at the end indicates inhibitory input: if any inhibitory input is 1 the output will be 0
x1 x2 x3
y ∈ {0, 1}
3
AND function
x1 x2
y ∈ {0, 1}
0
NOR function
x1 x2 x3
y ∈ {0, 1}
1
OR function
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
14/1
x1 x2 x3
y ∈ {0, 1}
θ
A McCulloch Pitts unit
x1 x2
y ∈ {0, 1}
1
x1 AND !x2
∗
∗
circle at the end indicates inhibitory input: if any inhibitory input is 1 the output will be 0
x1 x2 x3
y ∈ {0, 1}
3
AND function
x1 x2
y ∈ {0, 1}
0
NOR function
x1 x2 x3
y ∈ {0, 1}
1
OR function
x1
y ∈ {0, 1}
NOT function
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
14/1
x1 x2 x3
y ∈ {0, 1}
θ
A McCulloch Pitts unit
x1 x2
y ∈ {0, 1}
1
x1 AND !x2
∗
∗
circle at the end indicates inhibitory input: if any inhibitory input is 1 the output will be 0
x1 x2 x3
y ∈ {0, 1}
3
AND function
x1 x2
y ∈ {0, 1}
0
NOR function
x1 x2 x3
y ∈ {0, 1}
1
OR function
x1
y ∈ {0, 1}
0
NOT function
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
15/1
Can any boolean function be represented using a McCulloch Pitts unit ?
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
15/1
Can any boolean function be represented using a McCulloch Pitts unit ?
Before answering this question let us first see the geometric interpretation of a
MP unit ...
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
16/1
x1 x2
y ∈ {0, 1}
1
OR function
x1 + x2 =
P2
i=1 xi ≥ 1
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
16/1
x1 x2
y ∈ {0, 1}
1
OR function
x1 + x2 =
P2
i=1 xi ≥ 1
x1
x2
(0, 0)
(0, 1)
(1, 0)
(1, 1)
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
16/1
x1 x2
y ∈ {0, 1}
1
OR function
x1 + x2 =
P2
i=1 xi ≥ 1
x1
x2
(0, 0)
(0, 1)
(1, 0)
(1, 1)
x1 + x2 = θ = 1
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
16/1
x1 x2
y ∈ {0, 1}
1
OR function
x1 + x2 =
P2
i=1 xi ≥ 1
x1
x2
(0, 0)
(0, 1)
(1, 0)
(1, 1)
x1 + x2 = θ = 1
A single MP neuron splits the input points (4
points for 2 binary inputs) into two halves
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
16/1
x1 x2
y ∈ {0, 1}
1
OR function
x1 + x2 =
P2
i=1 xi ≥ 1
x1
x2
(0, 0)
(0, 1)
(1, 0)
(1, 1)
x1 + x2 = θ = 1
A single MP neuron splits the input points (4
points for 2 binary inputs) into two halves
Points lying on or above the line
Pn
i=1 xi−θ =
0 and points lying below this line
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
16/1
x1 x2
y ∈ {0, 1}
1
OR function
x1 + x2 =
P2
i=1 xi ≥ 1
x1
x2
(0, 0)
(0, 1)
(1, 0)
(1, 1)
x1 + x2 = θ = 1
A single MP neuron splits the input points (4
points for 2 binary inputs) into two halves
Points lying on or above the line
Pn
i=1 xi−θ =
0 and points lying below this line
In other words, all inputs which produce an
output 0 will be on one side (
Pn
i=1 xi < θ)
of the line and all inputs which produce an
output 1 will lie on the other side (
Pn
i=1 xi ≥
θ) of this line
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
16/1
x1 x2
y ∈ {0, 1}
1
OR function
x1 + x2 =
P2
i=1 xi ≥ 1
x1
x2
(0, 0)
(0, 1)
(1, 0)
(1, 1)
x1 + x2 = θ = 1
A single MP neuron splits the input points (4
points for 2 binary inputs) into two halves
Points lying on or above the line
Pn
i=1 xi−θ =
0 and points lying below this line
In other words, all inputs which produce an
output 0 will be on one side (
Pn
i=1 xi < θ)
of the line and all inputs which produce an
output 1 will lie on the other side (
Pn
i=1 xi ≥
θ) of this line
Let us convince ourselves about this with a
few more examples (if it is not already clear
from the math)
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
17/1
x1 x2
y ∈ {0, 1}
2
AND function
x1 + x2 =
P2
i=1 xi ≥ 2
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
17/1
x1 x2
y ∈ {0, 1}
2
AND function
x1 + x2 =
P2
i=1 xi ≥ 2
x1
x2
(0, 0)
(0, 1)
(1, 0)
(1, 1)
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
17/1
x1 x2
y ∈ {0, 1}
2
AND function
x1 + x2 =
P2
i=1 xi ≥ 2
x1
x2
(0, 0)
(0, 1)
(1, 0)
(1, 1)
x1 + x2 = θ = 2
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
17/1
x1 x2
y ∈ {0, 1}
2
AND function
x1 + x2 =
P2
i=1 xi ≥ 2
x1
x2
(0, 0)
(0, 1)
(1, 0)
(1, 1)
x1 + x2 = θ = 2
x1 x2
y ∈ {0, 1}
Tautology (always ON)
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
17/1
x1 x2
y ∈ {0, 1}
2
AND function
x1 + x2 =
P2
i=1 xi ≥ 2
x1
x2
(0, 0)
(0, 1)
(1, 0)
(1, 1)
x1 + x2 = θ = 2
x1 x2
y ∈ {0, 1}
0
Tautology (always ON)
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
17/1
x1 x2
y ∈ {0, 1}
2
AND function
x1 + x2 =
P2
i=1 xi ≥ 2
x1
x2
(0, 0)
(0, 1)
(1, 0)
(1, 1)
x1 + x2 = θ = 2
x1 x2
y ∈ {0, 1}
0
Tautology (always ON)
x1
x2
(0, 0)
(0, 1)
(1, 0)
(1, 1)
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
17/1
x1 x2
y ∈ {0, 1}
2
AND function
x1 + x2 =
P2
i=1 xi ≥ 2
x1
x2
(0, 0)
(0, 1)
(1, 0)
(1, 1)
x1 + x2 = θ = 2
x1 x2
y ∈ {0, 1}
0
Tautology (always ON)
x1
x2
(0, 0)
(0, 1)
(1, 0)
(1, 1)
x1 + x2 = θ = 0
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
18/1
x1 x2 x3
y ∈ {0, 1}
OR
1
What if we have more than 2 inputs?
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
18/1
x1 x2 x3
y ∈ {0, 1}
OR
1
x1
x2
x3
(0, 0, 0)
(0, 1, 0)
(1, 0, 0)
(1, 1, 0)
(0, 0, 1) (1, 0, 1)
(0, 1, 1) (1, 1, 1)
What if we have more than 2 inputs?
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
18/1
x1 x2 x3
y ∈ {0, 1}
OR
1
x1
x2
x3
(0, 0, 0)
(0, 1, 0)
(1, 0, 0)
(1, 1, 0)
(0, 0, 1) (1, 0, 1)
(0, 1, 1) (1, 1, 1)
What if we have more than 2 inputs?
Well, instead of a line we will have a
plane
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
18/1
x1 x2 x3
y ∈ {0, 1}
OR
1
x1
x2
x3
(0, 0, 0)
(0, 1, 0)
(1, 0, 0)
(1, 1, 0)
(0, 0, 1) (1, 0, 1)
(0, 1, 1) (1, 1, 1)
What if we have more than 2 inputs?
Well, instead of a line we will have a
plane
For the OR function, we want a plane
such that the point (0,0,0) lies on one
side and the remaining 7 points lie on
the other side of the plane
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
18/1
x1 x2 x3
y ∈ {0, 1}
OR
1
x1
x2
x3
(0, 0, 0)
(0, 1, 0)
(1, 0, 0)
(1, 1, 0)
(0, 0, 1) (1, 0, 1)
(0, 1, 1) (1, 1, 1)x1 + x2 + x3 = θ = 1
What if we have more than 2 inputs?
Well, instead of a line we will have a
plane
For the OR function, we want a plane
such that the point (0,0,0) lies on one
side and the remaining 7 points lie on
the other side of the plane
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
19/1
The story so far ...
A single McCulloch Pitts Neuron can be used to represent boolean functions
which are linearly separable
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
19/1
The story so far ...
A single McCulloch Pitts Neuron can be used to represent boolean functions
which are linearly separable
Linear separability (for boolean functions) : There exists a line (plane) such
that all inputs which produce a 1 lie on one side of the line (plane) and all
inputs which produce a 0 lie on other side of the line (plane)
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
20/1
Module 2.3: Perceptron
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
21/1
The story ahead ...
What about non-boolean (say, real) inputs ?
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
21/1
The story ahead ...
What about non-boolean (say, real) inputs ?
Do we always need to hand code the threshold ?
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
21/1
The story ahead ...
What about non-boolean (say, real) inputs ?
Do we always need to hand code the threshold ?
Are all inputs equal ? What if we want to assign more weight (importance) to
some inputs ?
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
21/1
The story ahead ...
What about non-boolean (say, real) inputs ?
Do we always need to hand code the threshold ?
Are all inputs equal ? What if we want to assign more weight (importance) to
some inputs ?
What about functions which are not linearly separable ?
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
22/1
Frank Rosenblatt, an American psychologist,
proposed the classical perceptron model
(1958)
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
22/1
x1 x2 .. .. xn
y
w1 w2 .. .. wn
Frank Rosenblatt, an American psychologist,
proposed the classical perceptron model
(1958)
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
22/1
x1 x2 .. .. xn
y
w1 w2 .. .. wn
Frank Rosenblatt, an American psychologist,
proposed the classical perceptron model
(1958)
A more general computational model than
McCulloch–Pitts neurons
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
22/1
x1 x2 .. .. xn
y
w1 w2 .. .. wn
Frank Rosenblatt, an American psychologist,
proposed the classical perceptron model
(1958)
A more general computational model than
McCulloch–Pitts neurons
Main differences: Introduction of numer-
ical weights for inputs and a mechanism for
learning these weights
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
22/1
x1 x2 .. .. xn
y
w1 w2 .. .. wn
Frank Rosenblatt, an American psychologist,
proposed the classical perceptron model
(1958)
A more general computational model than
McCulloch–Pitts neurons
Main differences: Introduction of numer-
ical weights for inputs and a mechanism for
learning these weights
Inputs are no longer limited to boolean values
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
22/1
x1 x2 .. .. xn
y
w1 w2 .. .. wn
Frank Rosenblatt, an American psychologist,
proposed the classical perceptron model
(1958)
A more general computational model than
McCulloch–Pitts neurons
Main differences: Introduction of numer-
ical weights for inputs and a mechanism for
learning these weights
Inputs are no longer limited to boolean values
Refined and carefully analyzed by Minsky and
Papert (1969) - their model is referred to as
the perceptron model here
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
23/1
x1 x2 .. .. xn
y
w1 w2 .. .. wn
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
23/1
x1 x2 .. .. xn
y
w1 w2 .. .. wn
y = 1 if
n
X
i=1
wi ∗ xi ≥ θ
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
23/1
x1 x2 .. .. xn
y
w1 w2 .. .. wn
y = 1 if
n
X
i=1
wi ∗ xi ≥ θ
= 0 if
n
X
i=1
wi ∗ xi < θ
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
23/1
x1 x2 .. .. xn
y
w1 w2 .. .. wn
y = 1 if
n
X
i=1
wi ∗ xi ≥ θ
= 0 if
n
X
i=1
wi ∗ xi < θ
Rewriting the above,
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
23/1
x1 x2 .. .. xn
y
w1 w2 .. .. wn
y = 1 if
n
X
i=1
wi ∗ xi ≥ θ
= 0 if
n
X
i=1
wi ∗ xi < θ
Rewriting the above,
y = 1 if
n
X
i=1
wi ∗ xi − θ ≥ 0
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
23/1
x1 x2 .. .. xn
y
w1 w2 .. .. wn
y = 1 if
n
X
i=1
wi ∗ xi ≥ θ
= 0 if
n
X
i=1
wi ∗ xi < θ
Rewriting the above,
y = 1 if
n
X
i=1
wi ∗ xi − θ ≥ 0
= 0 if
n
X
i=1
wi ∗ xi − θ < 0
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
23/1
x1 x2 .. .. xn
y
w1 w2 .. .. wn
A more accepted convention,
y = 1 if
n
X
i=1
wi ∗ xi ≥ θ
= 0 if
n
X
i=1
wi ∗ xi < θ
Rewriting the above,
y = 1 if
n
X
i=1
wi ∗ xi − θ ≥ 0
= 0 if
n
X
i=1
wi ∗ xi − θ < 0
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
23/1
x1 x2 .. .. xn
y
w1 w2 .. .. wn
A more accepted convention,
y = 1 if
n
X
i=0
wi ∗ xi ≥ 0
y = 1 if
n
X
i=1
wi ∗ xi ≥ θ
= 0 if
n
X
i=1
wi ∗ xi < θ
Rewriting the above,
y = 1 if
n
X
i=1
wi ∗ xi − θ ≥ 0
= 0 if
n
X
i=1
wi ∗ xi − θ < 0
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
23/1
x1 x2 .. .. xn
y
w1 w2 .. .. wn
A more accepted convention,
y = 1 if
n
X
i=0
wi ∗ xi ≥ 0
where, x0 = 1 and w0 = −θ
y = 1 if
n
X
i=1
wi ∗ xi ≥ θ
= 0 if
n
X
i=1
wi ∗ xi < θ
Rewriting the above,
y = 1 if
n
X
i=1
wi ∗ xi − θ ≥ 0
= 0 if
n
X
i=1
wi ∗ xi − θ < 0
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
23/1
x1 x2 .. .. xn
x0 = 1
y
w1 w2 .. .. wn
w0 = −θ
A more accepted convention,
y = 1 if
n
X
i=0
wi ∗ xi ≥ 0
where, x0 = 1 and w0 = −θ
y = 1 if
n
X
i=1
wi ∗ xi ≥ θ
= 0 if
n
X
i=1
wi ∗ xi < θ
Rewriting the above,
y = 1 if
n
X
i=1
wi ∗ xi − θ ≥ 0
= 0 if
n
X
i=1
wi ∗ xi − θ < 0
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
23/1
x1 x2 .. .. xn
x0 = 1
y
w1 w2 .. .. wn
w0 = −θ
A more accepted convention,
y = 1 if
n
X
i=0
wi ∗ xi ≥ 0
= 0 if
n
X
i=0
wi ∗ xi < 0
where, x0 = 1 and w0 = −θ
y = 1 if
n
X
i=1
wi ∗ xi ≥ θ
= 0 if
n
X
i=1
wi ∗ xi < θ
Rewriting the above,
y = 1 if
n
X
i=1
wi ∗ xi − θ ≥ 0
= 0 if
n
X
i=1
wi ∗ xi − θ < 0
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
24/1
We will now try to answer the following questions:
Why are we trying to implement boolean functions?
Why do we need weights ?
Why is w0 = −θ called the bias ?
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
25/1
x0 = 1 x1 x2 x3
y
w0 = −θ w1 w2 w3
Consider the task of predicting whether we would like
a movie or not
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
25/1
x0 = 1 x1 x2 x3
y
w0 = −θ w1 w2 w3
Consider the task of predicting whether we would like
a movie or not
Suppose, we base our decision on 3 inputs (binary, for
simplicity)
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
25/1
x0 = 1 x1 x2 x3
y
w0 = −θ w1 w2 w3
x1 = isActorDamon
x2 = isGenreThriller
x3 = isDirectorNolan
Consider the task of predicting whether we would like
a movie or not
Suppose, we base our decision on 3 inputs (binary, for
simplicity)
Based on our past viewing experience (data), we may
give a high weight to isDirectorNolan as compared to
the other inputs
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
25/1
x0 = 1 x1 x2 x3
y
w0 = −θ w1 w2 w3
x1 = isActorDamon
x2 = isGenreThriller
x3 = isDirectorNolan
Consider the task of predicting whether we would like
a movie or not
Suppose, we base our decision on 3 inputs (binary, for
simplicity)
Based on our past viewing experience (data), we may
give a high weight to isDirectorNolan as compared to
the other inputs
Specifically, even if the actor is not Matt Damon and
the genre is not thriller we would still want to cross
the threshold θ by assigning a high weight to isDirect-
orNolan
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
25/1
x0 = 1 x1 x2 x3
y
w0 = −θ w1 w2 w3
x1 = isActorDamon
x2 = isGenreThriller
x3 = isDirectorNolan
w0 is called the bias as it represents the prior (preju-
dice)
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
25/1
x0 = 1 x1 x2 x3
y
w0 = −θ w1 w2 w3
x1 = isActorDamon
x2 = isGenreThriller
x3 = isDirectorNolan
w0 is called the bias as it represents the prior (preju-
dice)
A movie buff may have a very low threshold and may
watch any movie irrespective of the genre, actor, dir-
ector [θ = 0]
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
25/1
x0 = 1 x1 x2 x3
y
w0 = −θ w1 w2 w3
x1 = isActorDamon
x2 = isGenreThriller
x3 = isDirectorNolan
w0 is called the bias as it represents the prior (preju-
dice)
A movie buff may have a very low threshold and may
watch any movie irrespective of the genre, actor, dir-
ector [θ = 0]
On the other hand, a selective viewer may only watch
thrillers starring Matt Damon and directed by Nolan
[θ = 3]
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
25/1
x0 = 1 x1 x2 x3
y
w0 = −θ w1 w2 w3
x1 = isActorDamon
x2 = isGenreThriller
x3 = isDirectorNolan
w0 is called the bias as it represents the prior (preju-
dice)
A movie buff may have a very low threshold and may
watch any movie irrespective of the genre, actor, dir-
ector [θ = 0]
On the other hand, a selective viewer may only watch
thrillers starring Matt Damon and directed by Nolan
[θ = 3]
The weights (w1, w2, ..., wn) and the bias (w0) will de-
pend on the data (viewer history in this case)
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
26/1
What kind of functions can be implemented using the perceptron? Any difference
from McCulloch Pitts neurons?
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
27/1
McCulloch Pitts Neuron
(assuming no inhibitory inputs)
y = 1 if
n
X
i=0
xi ≥ 0
= 0 if
n
X
i=0
xi < 0
Perceptron
y = 1 if
n
X
i=0
wi ∗ xi ≥ 0
= 0 if
n
X
i=0
wi ∗ xi < 0
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
27/1
McCulloch Pitts Neuron
(assuming no inhibitory inputs)
y = 1 if
n
X
i=0
xi ≥ 0
= 0 if
n
X
i=0
xi < 0
Perceptron
y = 1 if
n
X
i=0
wi ∗ xi ≥ 0
= 0 if
n
X
i=0
wi ∗ xi < 0
From the equations it should be clear that
even a perceptron separates the input space
into two halves
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
27/1
McCulloch Pitts Neuron
(assuming no inhibitory inputs)
y = 1 if
n
X
i=0
xi ≥ 0
= 0 if
n
X
i=0
xi < 0
Perceptron
y = 1 if
n
X
i=0
wi ∗ xi ≥ 0
= 0 if
n
X
i=0
wi ∗ xi < 0
From the equations it should be clear that
even a perceptron separates the input space
into two halves
All inputs which produce a 1 lie on one side
and all inputs which produce a 0 lie on the
other side
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
27/1
McCulloch Pitts Neuron
(assuming no inhibitory inputs)
y = 1 if
n
X
i=0
xi ≥ 0
= 0 if
n
X
i=0
xi < 0
Perceptron
y = 1 if
n
X
i=0
wi ∗ xi ≥ 0
= 0 if
n
X
i=0
wi ∗ xi < 0
From the equations it should be clear that
even a perceptron separates the input space
into two halves
All inputs which produce a 1 lie on one side
and all inputs which produce a 0 lie on the
other side
In other words, a single perceptron can only
be used to implement linearly separable func-
tions
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
27/1
McCulloch Pitts Neuron
(assuming no inhibitory inputs)
y = 1 if
n
X
i=0
xi ≥ 0
= 0 if
n
X
i=0
xi < 0
Perceptron
y = 1 if
n
X
i=0
wi ∗ xi ≥ 0
= 0 if
n
X
i=0
wi ∗ xi < 0
From the equations it should be clear that
even a perceptron separates the input space
into two halves
All inputs which produce a 1 lie on one side
and all inputs which produce a 0 lie on the
other side
In other words, a single perceptron can only
be used to implement linearly separable func-
tions
Then what is the difference?
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
27/1
McCulloch Pitts Neuron
(assuming no inhibitory inputs)
y = 1 if
n
X
i=0
xi ≥ 0
= 0 if
n
X
i=0
xi < 0
Perceptron
y = 1 if
n
X
i=0
wi ∗ xi ≥ 0
= 0 if
n
X
i=0
wi ∗ xi < 0
From the equations it should be clear that
even a perceptron separates the input space
into two halves
All inputs which produce a 1 lie on one side
and all inputs which produce a 0 lie on the
other side
In other words, a single perceptron can only
be used to implement linearly separable func-
tions
Then what is the difference? The weights (in-
cluding threshold) can be learned and the in-
puts can be real valued
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
27/1
McCulloch Pitts Neuron
(assuming no inhibitory inputs)
y = 1 if
n
X
i=0
xi ≥ 0
= 0 if
n
X
i=0
xi < 0
Perceptron
y = 1 if
n
X
i=0
wi ∗ xi ≥ 0
= 0 if
n
X
i=0
wi ∗ xi < 0
From the equations it should be clear that
even a perceptron separates the input space
into two halves
All inputs which produce a 1 lie on one side
and all inputs which produce a 0 lie on the
other side
In other words, a single perceptron can only
be used to implement linearly separable func-
tions
Then what is the difference? The weights (in-
cluding threshold) can be learned and the in-
puts can be real valued
We will first revisit some boolean functions
and then see the perceptron learning al-
gorithm (for learning weights)
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
28/1
x1 x2 OR
0 0
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
28/1
x1 x2 OR
0 0 0
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
28/1
x1 x2 OR
0 0 0 w0 +
P2
i=1 wixi
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
28/1
x1 x2 OR
0 0 0 w0 +
P2
i=1 wixi < 0
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
28/1
x1 x2 OR
0 0 0 w0 +
P2
i=1 wixi < 0
1 0 1
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
28/1
x1 x2 OR
0 0 0 w0 +
P2
i=1 wixi < 0
1 0 1 w0 +
P2
i=1 wixi ≥ 0
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
28/1
x1 x2 OR
0 0 0 w0 +
P2
i=1 wixi < 0
1 0 1 w0 +
P2
i=1 wixi ≥ 0
0 1 1
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
28/1
x1 x2 OR
0 0 0 w0 +
P2
i=1 wixi < 0
1 0 1 w0 +
P2
i=1 wixi ≥ 0
0 1 1 w0 +
P2
i=1 wixi ≥ 0
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
28/1
x1 x2 OR
0 0 0 w0 +
P2
i=1 wixi < 0
1 0 1 w0 +
P2
i=1 wixi ≥ 0
0 1 1 w0 +
P2
i=1 wixi ≥ 0
1 1 1 w0 +
P2
i=1 wixi ≥ 0
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
28/1
x1 x2 OR
0 0 0 w0 +
P2
i=1 wixi < 0
1 0 1 w0 +
P2
i=1 wixi ≥ 0
0 1 1 w0 +
P2
i=1 wixi ≥ 0
1 1 1 w0 +
P2
i=1 wixi ≥ 0
w0 + w1 · 0 + w2 · 0 < 0 =⇒ w0 < 0
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
28/1
x1 x2 OR
0 0 0 w0 +
P2
i=1 wixi < 0
1 0 1 w0 +
P2
i=1 wixi ≥ 0
0 1 1 w0 +
P2
i=1 wixi ≥ 0
1 1 1 w0 +
P2
i=1 wixi ≥ 0
w0 + w1 · 0 + w2 · 0 < 0 =⇒ w0 < 0
w0 + w1 · 0 + w2 · 1 ≥ 0 =⇒ w2 ≥ −w0
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
28/1
x1 x2 OR
0 0 0 w0 +
P2
i=1 wixi < 0
1 0 1 w0 +
P2
i=1 wixi ≥ 0
0 1 1 w0 +
P2
i=1 wixi ≥ 0
1 1 1 w0 +
P2
i=1 wixi ≥ 0
w0 + w1 · 0 + w2 · 0 < 0 =⇒ w0 < 0
w0 + w1 · 0 + w2 · 1 ≥ 0 =⇒ w2 ≥ −w0
w0 + w1 · 1 + w2 · 0 ≥ 0 =⇒ w1 ≥ −w0
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
28/1
x1 x2 OR
0 0 0 w0 +
P2
i=1 wixi < 0
1 0 1 w0 +
P2
i=1 wixi ≥ 0
0 1 1 w0 +
P2
i=1 wixi ≥ 0
1 1 1 w0 +
P2
i=1 wixi ≥ 0
w0 + w1 · 0 + w2 · 0 < 0 =⇒ w0 < 0
w0 + w1 · 0 + w2 · 1 ≥ 0 =⇒ w2 ≥ −w0
w0 + w1 · 1 + w2 · 0 ≥ 0 =⇒ w1 ≥ −w0
w0 + w1 · 1 + w2 · 1 ≥ 0 =⇒ w1 + w2 ≥ −w0
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
28/1
x1 x2 OR
0 0 0 w0 +
P2
i=1 wixi < 0
1 0 1 w0 +
P2
i=1 wixi ≥ 0
0 1 1 w0 +
P2
i=1 wixi ≥ 0
1 1 1 w0 +
P2
i=1 wixi ≥ 0
w0 + w1 · 0 + w2 · 0 < 0 =⇒ w0 < 0
w0 + w1 · 0 + w2 · 1 ≥ 0 =⇒ w2 ≥ −w0
w0 + w1 · 1 + w2 · 0 ≥ 0 =⇒ w1 ≥ −w0
w0 + w1 · 1 + w2 · 1 ≥ 0 =⇒ w1 + w2 ≥ −w0
One possible solution to this set of inequalities
is w0 = −1, w1 = 1.1, , w2 = 1.1 (and various
other solutions are possible)
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
28/1
x1 x2 OR
0 0 0 w0 +
P2
i=1 wixi < 0
1 0 1 w0 +
P2
i=1 wixi ≥ 0
0 1 1 w0 +
P2
i=1 wixi ≥ 0
1 1 1 w0 +
P2
i=1 wixi ≥ 0
w0 + w1 · 0 + w2 · 0 < 0 =⇒ w0 < 0
w0 + w1 · 0 + w2 · 1 ≥ 0 =⇒ w2 ≥ −w0
w0 + w1 · 1 + w2 · 0 ≥ 0 =⇒ w1 ≥ −w0
w0 + w1 · 1 + w2 · 1 ≥ 0 =⇒ w1 + w2 ≥ −w0
One possible solution to this set of inequalities
is w0 = −1, w1 = 1.1, , w2 = 1.1 (and various
other solutions are possible)
x1
x2
(0, 0)
(0, 1)
(1, 0)
(1, 1)
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
28/1
x1 x2 OR
0 0 0 w0 +
P2
i=1 wixi < 0
1 0 1 w0 +
P2
i=1 wixi ≥ 0
0 1 1 w0 +
P2
i=1 wixi ≥ 0
1 1 1 w0 +
P2
i=1 wixi ≥ 0
w0 + w1 · 0 + w2 · 0 < 0 =⇒ w0 < 0
w0 + w1 · 0 + w2 · 1 ≥ 0 =⇒ w2 ≥ −w0
w0 + w1 · 1 + w2 · 0 ≥ 0 =⇒ w1 ≥ −w0
w0 + w1 · 1 + w2 · 1 ≥ 0 =⇒ w1 + w2 ≥ −w0
One possible solution to this set of inequalities
is w0 = −1, w1 = 1.1, , w2 = 1.1 (and various
other solutions are possible)
x1
x2
(0, 0)
(0, 1)
(1, 0)
(1, 1)
−1 + 1.1x1 + 1.1x2 = 0
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
28/1
x1 x2 OR
0 0 0 w0 +
P2
i=1 wixi < 0
1 0 1 w0 +
P2
i=1 wixi ≥ 0
0 1 1 w0 +
P2
i=1 wixi ≥ 0
1 1 1 w0 +
P2
i=1 wixi ≥ 0
w0 + w1 · 0 + w2 · 0 < 0 =⇒ w0 < 0
w0 + w1 · 0 + w2 · 1 ≥ 0 =⇒ w2 ≥ −w0
w0 + w1 · 1 + w2 · 0 ≥ 0 =⇒ w1 ≥ −w0
w0 + w1 · 1 + w2 · 1 ≥ 0 =⇒ w1 + w2 ≥ −w0
One possible solution to this set of inequalities
is w0 = −1, w1 = 1.1, , w2 = 1.1 (and various
other solutions are possible)
x1
x2
(0, 0)
(0, 1)
(1, 0)
(1, 1)
−1 + 1.1x1 + 1.1x2 = 0
Note that we can come up
with a similar set of inequal-
ities and find the value of θ
for a McCulloch Pitts neuron
also
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
28/1
x1 x2 OR
0 0 0 w0 +
P2
i=1 wixi < 0
1 0 1 w0 +
P2
i=1 wixi ≥ 0
0 1 1 w0 +
P2
i=1 wixi ≥ 0
1 1 1 w0 +
P2
i=1 wixi ≥ 0
w0 + w1 · 0 + w2 · 0 < 0 =⇒ w0 < 0
w0 + w1 · 0 + w2 · 1 ≥ 0 =⇒ w2 ≥ −w0
w0 + w1 · 1 + w2 · 0 ≥ 0 =⇒ w1 ≥ −w0
w0 + w1 · 1 + w2 · 1 ≥ 0 =⇒ w1 + w2 ≥ −w0
One possible solution to this set of inequalities
is w0 = −1, w1 = 1.1, , w2 = 1.1 (and various
other solutions are possible)
x1
x2
(0, 0)
(0, 1)
(1, 0)
(1, 1)
−1 + 1.1x1 + 1.1x2 = 0
Note that we can come up
with a similar set of inequal-
ities and find the value of θ
for a McCulloch Pitts neuron
also (Try it!)
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
29/1
Module 2.4: Errors and Error Surfaces
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
30/1
Let us fix the threshold (−w0 = 1) and try
different values of w1, w2
x1
x2
(0, 0)
(0, 1)
(1, 0)
(1, 1)
−1 + 1.1x1 + 1.1x2 = 0
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
30/1
Let us fix the threshold (−w0 = 1) and try
different values of w1, w2
Say, w1 = −1, w2 = −1
x1
x2
(0, 0)
(0, 1)
(1, 0)
(1, 1)
−1 + 1.1x1 + 1.1x2 = 0
−1 + (−1)x1 + (−1)x2 = 0
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
30/1
Let us fix the threshold (−w0 = 1) and try
different values of w1, w2
Say, w1 = −1, w2 = −1
What is wrong with this line?
x1
x2
(0, 0)
(0, 1)
(1, 0)
(1, 1)
−1 + 1.1x1 + 1.1x2 = 0
−1 + (−1)x1 + (−1)x2 = 0
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
30/1
Let us fix the threshold (−w0 = 1) and try
different values of w1, w2
Say, w1 = −1, w2 = −1
What is wrong with this line? We make an
error on 1 out of the 4 inputs
x1
x2
(0, 0)
(0, 1)
(1, 0)
(1, 1)
−1 + 1.1x1 + 1.1x2 = 0
−1 + (−1)x1 + (−1)x2 = 0
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
30/1
Let us fix the threshold (−w0 = 1) and try
different values of w1, w2
Say, w1 = −1, w2 = −1
What is wrong with this line? We make an
error on 1 out of the 4 inputs
Lets try some more values of w1, w2 and note
how many errors we make
x1
x2
(0, 0)
(0, 1)
(1, 0)
(1, 1)
−1 + 1.1x1 + 1.1x2 = 0
−1 + (−1)x1 + (−1)x2 = 0
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
30/1
Let us fix the threshold (−w0 = 1) and try
different values of w1, w2
Say, w1 = −1, w2 = −1
What is wrong with this line? We make an
error on 1 out of the 4 inputs
Lets try some more values of w1, w2 and note
how many errors we make
w1 w2 errors
-1 -1 3 x1
x2
(0, 0)
(0, 1)
(1, 0)
(1, 1)
−1 + 1.1x1 + 1.1x2 = 0
−1 + (−1)x1 + (−1)x2 = 0
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
30/1
Let us fix the threshold (−w0 = 1) and try
different values of w1, w2
Say, w1 = −1, w2 = −1
What is wrong with this line? We make an
error on 1 out of the 4 inputs
Lets try some more values of w1, w2 and note
how many errors we make
w1 w2 errors
-1 -1 3
1.5 0 1
x1
x2
(0, 0)
(0, 1)
(1, 0)
(1, 1)
−1 + 1.1x1 + 1.1x2 = 0
−1 + (−1)x1 + (−1)x2 = 0
−1 + (1.5)x1 + (0)x2 = 0
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
30/1
Let us fix the threshold (−w0 = 1) and try
different values of w1, w2
Say, w1 = −1, w2 = −1
What is wrong with this line? We make an
error on 1 out of the 4 inputs
Lets try some more values of w1, w2 and note
how many errors we make
w1 w2 errors
-1 -1 3
1.5 0 1
0.45 0.45 3
x1
x2
(0, 0)
(0, 1)
(1, 0)
(1, 1)
−1 + 1.1x1 + 1.1x2 = 0
−1 + (−1)x1 + (−1)x2 = 0
−1 + (1.5)x1 + (0)x2 = 0
−1 + (0.45)x1 + (0.45)x2 = 0
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
30/1
Let us fix the threshold (−w0 = 1) and try
different values of w1, w2
Say, w1 = −1, w2 = −1
What is wrong with this line? We make an
error on 1 out of the 4 inputs
Lets try some more values of w1, w2 and note
how many errors we make
w1 w2 errors
-1 -1 3
1.5 0 1
0.45 0.45 3
We are interested in those values of w0, w1, w2
which result in 0 error
x1
x2
(0, 0)
(0, 1)
(1, 0)
(1, 1)
−1 + 1.1x1 + 1.1x2 = 0
−1 + (−1)x1 + (−1)x2 = 0
−1 + (1.5)x1 + (0)x2 = 0
−1 + (0.45)x1 + (0.45)x2 = 0
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
30/1
Let us fix the threshold (−w0 = 1) and try
different values of w1, w2
Say, w1 = −1, w2 = −1
What is wrong with this line? We make an
error on 1 out of the 4 inputs
Lets try some more values of w1, w2 and note
how many errors we make
w1 w2 errors
-1 -1 3
1.5 0 1
0.45 0.45 3
We are interested in those values of w0, w1, w2
which result in 0 error
Let us plot the error surface corresponding to
different values of w0, w1, w2
x1
x2
(0, 0)
(0, 1)
(1, 0)
(1, 1)
−1 + 1.1x1 + 1.1x2 = 0
−1 + (−1)x1 + (−1)x2 = 0
−1 + (1.5)x1 + (0)x2 = 0
−1 + (0.45)x1 + (0.45)x2 = 0
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
31/1
For ease of analysis, we will keep w0
fixed (-1) and plot the error for dif-
ferent values of w1, w2
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
31/1
For ease of analysis, we will keep w0
fixed (-1) and plot the error for dif-
ferent values of w1, w2
For a given w0, w1, w2 we will com-
pute −w0+w1∗x1+w2∗x2 for all com-
binations of (x1, x2) and note down
how many errors we make
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
31/1
For ease of analysis, we will keep w0
fixed (-1) and plot the error for dif-
ferent values of w1, w2
For a given w0, w1, w2 we will com-
pute −w0+w1∗x1+w2∗x2 for all com-
binations of (x1, x2) and note down
how many errors we make
For the OR function, an error occurs
if (x1, x2) = (0, 0) but −w0 +w1 ∗x1 +
w2 ∗ x2 ≥ 0 or if (x1, x2) 6= (0, 0) but
−w0 + w1 ∗ x1 + w2 ∗ x2 < 0
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
31/1
For ease of analysis, we will keep w0
fixed (-1) and plot the error for dif-
ferent values of w1, w2
For a given w0, w1, w2 we will com-
pute −w0+w1∗x1+w2∗x2 for all com-
binations of (x1, x2) and note down
how many errors we make
For the OR function, an error occurs
if (x1, x2) = (0, 0) but −w0 +w1 ∗x1 +
w2 ∗ x2 ≥ 0 or if (x1, x2) 6= (0, 0) but
−w0 + w1 ∗ x1 + w2 ∗ x2 < 0
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
31/1
For ease of analysis, we will keep w0
fixed (-1) and plot the error for dif-
ferent values of w1, w2
For a given w0, w1, w2 we will com-
pute −w0+w1∗x1+w2∗x2 for all com-
binations of (x1, x2) and note down
how many errors we make
For the OR function, an error occurs
if (x1, x2) = (0, 0) but −w0 +w1 ∗x1 +
w2 ∗ x2 ≥ 0 or if (x1, x2) 6= (0, 0) but
−w0 + w1 ∗ x1 + w2 ∗ x2 < 0
We are interested in finding an al-
gorithm which finds the values of
w1, w2 which minimize this error
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
32/1
Module 2.5: Perceptron Learning Algorithm
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
33/1
We will now see a more principled approach for learning these weights and
threshold but before that let us answer this question...
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
33/1
We will now see a more principled approach for learning these weights and
threshold but before that let us answer this question...
Apart from implementing boolean functions (which does not look very interest-
ing) what can a perceptron be used for ?
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
33/1
We will now see a more principled approach for learning these weights and
threshold but before that let us answer this question...
Apart from implementing boolean functions (which does not look very interest-
ing) what can a perceptron be used for ?
Our interest lies in the use of perceptron as a binary classifier. Let us see what
this means...
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
34/1
Let us reconsider our problem of deciding
whether to watch a movie or not
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
34/1
Let us reconsider our problem of deciding
whether to watch a movie or not
Suppose we are given a list of m movies and
a label (class) associated with each movie in-
dicating whether the user liked this movie or
not : binary decision
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
34/1
Let us reconsider our problem of deciding
whether to watch a movie or not
Suppose we are given a list of m movies and
a label (class) associated with each movie in-
dicating whether the user liked this movie or
not : binary decision
Further, suppose we represent each movie
with n features (some boolean, some real val-
ued)
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
34/1
x1 = isActorDamon
x2 = isGenreThriller
x3 = isDirectorNolan
x4 = imdbRating(scaled to 0 to 1)
... ...
xn = criticsRating(scaled to 0 to 1)
Let us reconsider our problem of deciding
whether to watch a movie or not
Suppose we are given a list of m movies and
a label (class) associated with each movie in-
dicating whether the user liked this movie or
not : binary decision
Further, suppose we represent each movie
with n features (some boolean, some real val-
ued)
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
34/1
x0 = 1 x1 x2 .. .. xn
y
x1 = isActorDamon
x2 = isGenreThriller
x3 = isDirectorNolan
x4 = imdbRating(scaled to 0 to 1)
... ...
xn = criticsRating(scaled to 0 to 1)
Let us reconsider our problem of deciding
whether to watch a movie or not
Suppose we are given a list of m movies and
a label (class) associated with each movie in-
dicating whether the user liked this movie or
not : binary decision
Further, suppose we represent each movie
with n features (some boolean, some real val-
ued)
We will assume that the data is linearly sep-
arable and we want a perceptron to learn how
to make this decision
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
34/1
x0 = 1 x1 x2 .. .. xn
y
w0 = −θ w1 w2 .. .. wn
x1 = isActorDamon
x2 = isGenreThriller
x3 = isDirectorNolan
x4 = imdbRating(scaled to 0 to 1)
... ...
xn = criticsRating(scaled to 0 to 1)
Let us reconsider our problem of deciding
whether to watch a movie or not
Suppose we are given a list of m movies and
a label (class) associated with each movie in-
dicating whether the user liked this movie or
not : binary decision
Further, suppose we represent each movie
with n features (some boolean, some real val-
ued)
We will assume that the data is linearly sep-
arable and we want a perceptron to learn how
to make this decision
In other words, we want the perceptron to find
the equation of this separating plane (or find
the values of w0, w1, w2, .., wm)
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
35/1
Algorithm: Perceptron Learning Algorithm
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
35/1
Algorithm: Perceptron Learning Algorithm
P ← inputs with label 1;
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
35/1
Algorithm: Perceptron Learning Algorithm
P ← inputs with label 1;
N ← inputs with label 0;
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
35/1
Algorithm: Perceptron Learning Algorithm
P ← inputs with label 1;
N ← inputs with label 0;
Initialize w randomly;
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
35/1
Algorithm: Perceptron Learning Algorithm
P ← inputs with label 1;
N ← inputs with label 0;
Initialize w randomly;
while !convergence do
end
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
35/1
Algorithm: Perceptron Learning Algorithm
P ← inputs with label 1;
N ← inputs with label 0;
Initialize w randomly;
while !convergence do
end
//the algorithm converges when all the
inputs are classified correctly
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
35/1
Algorithm: Perceptron Learning Algorithm
P ← inputs with label 1;
N ← inputs with label 0;
Initialize w randomly;
while !convergence do
Pick random x ∈ P ∪ N ;
end
//the algorithm converges when all the
inputs are classified correctly
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
35/1
Algorithm: Perceptron Learning Algorithm
P ← inputs with label 1;
N ← inputs with label 0;
Initialize w randomly;
while !convergence do
Pick random x ∈ P ∪ N ;
if x ∈ P and
Pn
i=0 wi ∗ xi < 0 then
end
end
//the algorithm converges when all the
inputs are classified correctly
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
35/1
Algorithm: Perceptron Learning Algorithm
P ← inputs with label 1;
N ← inputs with label 0;
Initialize w randomly;
while !convergence do
Pick random x ∈ P ∪ N ;
if x ∈ P and
Pn
i=0 wi ∗ xi < 0 then
w = w + x ;
end
end
//the algorithm converges when all the
inputs are classified correctly
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
35/1
Algorithm: Perceptron Learning Algorithm
P ← inputs with label 1;
N ← inputs with label 0;
Initialize w randomly;
while !convergence do
Pick random x ∈ P ∪ N ;
if x ∈ P and
Pn
i=0 wi ∗ xi < 0 then
w = w + x ;
end
if x ∈ N and
Pn
i=0 wi ∗ xi ≥ 0 then
end
end
//the algorithm converges when all the
inputs are classified correctly
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
35/1
Algorithm: Perceptron Learning Algorithm
P ← inputs with label 1;
N ← inputs with label 0;
Initialize w randomly;
while !convergence do
Pick random x ∈ P ∪ N ;
if x ∈ P and
Pn
i=0 wi ∗ xi < 0 then
w = w + x ;
end
if x ∈ N and
Pn
i=0 wi ∗ xi ≥ 0 then
w = w − x ;
end
end
//the algorithm converges when all the
inputs are classified correctly
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
35/1
Algorithm: Perceptron Learning Algorithm
P ← inputs with label 1;
N ← inputs with label 0;
Initialize w randomly;
while !convergence do
Pick random x ∈ P ∪ N ;
if x ∈ P and
Pn
i=0 wi ∗ xi < 0 then
w = w + x ;
end
if x ∈ N and
Pn
i=0 wi ∗ xi ≥ 0 then
w = w − x ;
end
end
//the algorithm converges when all the
inputs are classified correctly
Why would this work ?
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
35/1
Algorithm: Perceptron Learning Algorithm
P ← inputs with label 1;
N ← inputs with label 0;
Initialize w randomly;
while !convergence do
Pick random x ∈ P ∪ N ;
if x ∈ P and
Pn
i=0 wi ∗ xi < 0 then
w = w + x ;
end
if x ∈ N and
Pn
i=0 wi ∗ xi ≥ 0 then
w = w − x ;
end
end
//the algorithm converges when all the
inputs are classified correctly
Why would this work ?
To understand why this works
we will have to get into a bit of
Linear Algebra and a bit of geo-
metry...
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
36/1
Consider two vectors w and x
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
36/1
Consider two vectors w and x
w = [w0, w1, w2, ..., wn]
x = [1, x1, x2, ..., xn]
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
36/1
Consider two vectors w and x
w = [w0, w1, w2, ..., wn]
x = [1, x1, x2, ..., xn]
w · x = wT
x =
n
X
i=0
wi ∗ xi
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
36/1
Consider two vectors w and x
w = [w0, w1, w2, ..., wn]
x = [1, x1, x2, ..., xn]
w · x = wT
x =
n
X
i=0
wi ∗ xi
We can thus rewrite the perceptron
rule as
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
36/1
Consider two vectors w and x
w = [w0, w1, w2, ..., wn]
x = [1, x1, x2, ..., xn]
w · x = wT
x =
n
X
i=0
wi ∗ xi
We can thus rewrite the perceptron
rule as
y = 1 if wT
x ≥ 0
= 0 if wT
x < 0
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
36/1
Consider two vectors w and x
w = [w0, w1, w2, ..., wn]
x = [1, x1, x2, ..., xn]
w · x = wT
x =
n
X
i=0
wi ∗ xi
We can thus rewrite the perceptron
rule as
y = 1 if wT
x ≥ 0
= 0 if wT
x < 0
We are interested in finding the line
wTx = 0 which divides the input
space into two halves
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
36/1
Consider two vectors w and x
w = [w0, w1, w2, ..., wn]
x = [1, x1, x2, ..., xn]
w · x = wT
x =
n
X
i=0
wi ∗ xi
We can thus rewrite the perceptron
rule as
y = 1 if wT
x ≥ 0
= 0 if wT
x < 0
We are interested in finding the line
wTx = 0 which divides the input
space into two halves
Every point (x) on this line satisfies
the equation wTx = 0
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
36/1
Consider two vectors w and x
w = [w0, w1, w2, ..., wn]
x = [1, x1, x2, ..., xn]
w · x = wT
x =
n
X
i=0
wi ∗ xi
We can thus rewrite the perceptron
rule as
y = 1 if wT
x ≥ 0
= 0 if wT
x < 0
We are interested in finding the line
wTx = 0 which divides the input
space into two halves
Every point (x) on this line satisfies
the equation wTx = 0
What can you tell about the angle (α)
between w and any point (x) which
lies on this line ?
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
36/1
Consider two vectors w and x
w = [w0, w1, w2, ..., wn]
x = [1, x1, x2, ..., xn]
w · x = wT
x =
n
X
i=0
wi ∗ xi
We can thus rewrite the perceptron
rule as
y = 1 if wT
x ≥ 0
= 0 if wT
x < 0
We are interested in finding the line
wTx = 0 which divides the input
space into two halves
Every point (x) on this line satisfies
the equation wTx = 0
What can you tell about the angle (α)
between w and any point (x) which
lies on this line ?
The angle is 90◦ (∵ cosα = wT x
||w||||x|| =
0)
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
36/1
Consider two vectors w and x
w = [w0, w1, w2, ..., wn]
x = [1, x1, x2, ..., xn]
w · x = wT
x =
n
X
i=0
wi ∗ xi
We can thus rewrite the perceptron
rule as
y = 1 if wT
x ≥ 0
= 0 if wT
x < 0
We are interested in finding the line
wTx = 0 which divides the input
space into two halves
Every point (x) on this line satisfies
the equation wTx = 0
What can you tell about the angle (α)
between w and any point (x) which
lies on this line ?
The angle is 90◦ (∵ cosα = wT x
||w||||x|| =
0)
Since the vector w is perpendicular to
every point on the line it is actually
perpendicular to the line itself
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
37/1
x1
x2
w
wTx = 0
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
37/1
Consider some points (vectors) which lie in
the positive half space of this line (i.e., wTx ≥
0)
x1
x2
p1
p2
p3
w
wTx = 0
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
37/1
Consider some points (vectors) which lie in
the positive half space of this line (i.e., wTx ≥
0)
What will be the angle between any such vec-
tor and w ?
x1
x2
p1
p2
p3
w
wTx = 0
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
37/1
Consider some points (vectors) which lie in
the positive half space of this line (i.e., wTx ≥
0)
What will be the angle between any such vec-
tor and w ? Obviously, less than 90◦
x1
x2
p1
p2
p3
w
wTx = 0
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
37/1
Consider some points (vectors) which lie in
the positive half space of this line (i.e., wTx ≥
0)
What will be the angle between any such vec-
tor and w ? Obviously, less than 90◦
What about points (vectors) which lie in the
negative half space of this line (i.e., wTx < 0)
x1
x2
p1
p2
p3
n1
n2 n3
w
wTx = 0
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
37/1
Consider some points (vectors) which lie in
the positive half space of this line (i.e., wTx ≥
0)
What will be the angle between any such vec-
tor and w ? Obviously, less than 90◦
What about points (vectors) which lie in the
negative half space of this line (i.e., wTx < 0)
What will be the angle between any such vec-
tor and w ?
x1
x2
p1
p2
p3
n1
n2 n3
w
wTx = 0
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
37/1
Consider some points (vectors) which lie in
the positive half space of this line (i.e., wTx ≥
0)
What will be the angle between any such vec-
tor and w ? Obviously, less than 90◦
What about points (vectors) which lie in the
negative half space of this line (i.e., wTx < 0)
What will be the angle between any such vec-
tor and w ? Obviously, greater than 90◦
x1
x2
p1
p2
p3
n1
n2 n3
w
wTx = 0
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
37/1
Consider some points (vectors) which lie in
the positive half space of this line (i.e., wTx ≥
0)
What will be the angle between any such vec-
tor and w ? Obviously, less than 90◦
What about points (vectors) which lie in the
negative half space of this line (i.e., wTx < 0)
What will be the angle between any such vec-
tor and w ? Obviously, greater than 90◦
Of course, this also follows from the formula
(cosα = wT x
||w||||x|| )
x1
x2
p1
p2
p3
n1
n2 n3
w
wTx = 0
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
37/1
Consider some points (vectors) which lie in
the positive half space of this line (i.e., wTx ≥
0)
What will be the angle between any such vec-
tor and w ? Obviously, less than 90◦
What about points (vectors) which lie in the
negative half space of this line (i.e., wTx < 0)
What will be the angle between any such vec-
tor and w ? Obviously, greater than 90◦
Of course, this also follows from the formula
(cosα = wT x
||w||||x|| )
Keeping this picture in mind let us revisit the
algorithm
x1
x2
p1
p2
p3
n1
n2 n3
w
wTx = 0
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
38/1
Algorithm: Perceptron Learning Algorithm
P ← inputs with label 1;
N ← inputs with label 0;
Initialize w randomly;
while !convergence do
Pick random x ∈ P ∪ N ;
if x ∈ P and w.x < 0 then
w = w + x ;
end
if x ∈ N and w.x ≥ 0 then
w = w − x ;
end
end
//the algorithm converges when all the
inputs are classified correctly
cosα =
wT x
||w||||x||
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
38/1
Algorithm: Perceptron Learning Algorithm
P ← inputs with label 1;
N ← inputs with label 0;
Initialize w randomly;
while !convergence do
Pick random x ∈ P ∪ N ;
if x ∈ P and w.x < 0 then
w = w + x ;
end
if x ∈ N and w.x ≥ 0 then
w = w − x ;
end
end
//the algorithm converges when all the
inputs are classified correctly
cosα =
wT x
||w||||x||
For x ∈ P if w.x < 0 then it
means that the angle (α) between
this x and the current w is
greater than 90◦
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
38/1
Algorithm: Perceptron Learning Algorithm
P ← inputs with label 1;
N ← inputs with label 0;
Initialize w randomly;
while !convergence do
Pick random x ∈ P ∪ N ;
if x ∈ P and w.x < 0 then
w = w + x ;
end
if x ∈ N and w.x ≥ 0 then
w = w − x ;
end
end
//the algorithm converges when all the
inputs are classified correctly
cosα =
wT x
||w||||x||
For x ∈ P if w.x < 0 then it
means that the angle (α) between
this x and the current w is
greater than 90◦ (but we want α
to be less than 90◦)
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
38/1
Algorithm: Perceptron Learning Algorithm
P ← inputs with label 1;
N ← inputs with label 0;
Initialize w randomly;
while !convergence do
Pick random x ∈ P ∪ N ;
if x ∈ P and w.x < 0 then
w = w + x ;
end
if x ∈ N and w.x ≥ 0 then
w = w − x ;
end
end
//the algorithm converges when all the
inputs are classified correctly
cosα =
wT x
||w||||x||
For x ∈ P if w.x < 0 then it
means that the angle (α) between
this x and the current w is
greater than 90◦ (but we want α
to be less than 90◦)
What happens to the new angle
(αnew) when wnew = w + x
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
38/1
Algorithm: Perceptron Learning Algorithm
P ← inputs with label 1;
N ← inputs with label 0;
Initialize w randomly;
while !convergence do
Pick random x ∈ P ∪ N ;
if x ∈ P and w.x < 0 then
w = w + x ;
end
if x ∈ N and w.x ≥ 0 then
w = w − x ;
end
end
//the algorithm converges when all the
inputs are classified correctly
cosα =
wT x
||w||||x||
For x ∈ P if w.x < 0 then it
means that the angle (α) between
this x and the current w is
greater than 90◦ (but we want α
to be less than 90◦)
What happens to the new angle
(αnew) when wnew = w + x
cos(αnew) ∝ wnew
T
x
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
38/1
Algorithm: Perceptron Learning Algorithm
P ← inputs with label 1;
N ← inputs with label 0;
Initialize w randomly;
while !convergence do
Pick random x ∈ P ∪ N ;
if x ∈ P and w.x < 0 then
w = w + x ;
end
if x ∈ N and w.x ≥ 0 then
w = w − x ;
end
end
//the algorithm converges when all the
inputs are classified correctly
cosα =
wT x
||w||||x||
For x ∈ P if w.x < 0 then it
means that the angle (α) between
this x and the current w is
greater than 90◦ (but we want α
to be less than 90◦)
What happens to the new angle
(αnew) when wnew = w + x
cos(αnew) ∝ wnew
T
x
∝ (w + x)T
x
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
38/1
Algorithm: Perceptron Learning Algorithm
P ← inputs with label 1;
N ← inputs with label 0;
Initialize w randomly;
while !convergence do
Pick random x ∈ P ∪ N ;
if x ∈ P and w.x < 0 then
w = w + x ;
end
if x ∈ N and w.x ≥ 0 then
w = w − x ;
end
end
//the algorithm converges when all the
inputs are classified correctly
cosα =
wT x
||w||||x||
For x ∈ P if w.x < 0 then it
means that the angle (α) between
this x and the current w is
greater than 90◦ (but we want α
to be less than 90◦)
What happens to the new angle
(αnew) when wnew = w + x
cos(αnew) ∝ wnew
T
x
∝ (w + x)T
x
∝ wT
x + xT
x
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
38/1
Algorithm: Perceptron Learning Algorithm
P ← inputs with label 1;
N ← inputs with label 0;
Initialize w randomly;
while !convergence do
Pick random x ∈ P ∪ N ;
if x ∈ P and w.x < 0 then
w = w + x ;
end
if x ∈ N and w.x ≥ 0 then
w = w − x ;
end
end
//the algorithm converges when all the
inputs are classified correctly
cosα =
wT x
||w||||x||
For x ∈ P if w.x < 0 then it
means that the angle (α) between
this x and the current w is
greater than 90◦ (but we want α
to be less than 90◦)
What happens to the new angle
(αnew) when wnew = w + x
cos(αnew) ∝ wnew
T
x
∝ (w + x)T
x
∝ wT
x + xT
x
∝ cosα + xT
x
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
38/1
Algorithm: Perceptron Learning Algorithm
P ← inputs with label 1;
N ← inputs with label 0;
Initialize w randomly;
while !convergence do
Pick random x ∈ P ∪ N ;
if x ∈ P and w.x < 0 then
w = w + x ;
end
if x ∈ N and w.x ≥ 0 then
w = w − x ;
end
end
//the algorithm converges when all the
inputs are classified correctly
cosα =
wT x
||w||||x||
For x ∈ P if w.x < 0 then it
means that the angle (α) between
this x and the current w is
greater than 90◦ (but we want α
to be less than 90◦)
What happens to the new angle
(αnew) when wnew = w + x
cos(αnew) ∝ wnew
T
x
∝ (w + x)T
x
∝ wT
x + xT
x
∝ cosα + xT
x
cos(αnew) > cosα
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
38/1
Algorithm: Perceptron Learning Algorithm
P ← inputs with label 1;
N ← inputs with label 0;
Initialize w randomly;
while !convergence do
Pick random x ∈ P ∪ N ;
if x ∈ P and w.x < 0 then
w = w + x ;
end
if x ∈ N and w.x ≥ 0 then
w = w − x ;
end
end
//the algorithm converges when all the
inputs are classified correctly
cosα =
wT x
||w||||x||
For x ∈ P if w.x < 0 then it
means that the angle (α) between
this x and the current w is
greater than 90◦ (but we want α
to be less than 90◦)
What happens to the new angle
(αnew) when wnew = w + x
cos(αnew) ∝ wnew
T
x
∝ (w + x)T
x
∝ wT
x + xT
x
∝ cosα + xT
x
cos(αnew) > cosα
Thus αnew will be less than α and
this is exactly what we want
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
39/1
Algorithm: Perceptron Learning Algorithm
P ← inputs with label 1;
N ← inputs with label 0;
Initialize w randomly;
while !convergence do
Pick random x ∈ P ∪ N ;
if x ∈ P and w.x < 0 then
w = w + x ;
end
if x ∈ N and w.x ≥ 0 then
w = w − x ;
end
end
//the algorithm converges when all the
inputs are classified correctly
cosα =
wT x
||w||||x||
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
39/1
Algorithm: Perceptron Learning Algorithm
P ← inputs with label 1;
N ← inputs with label 0;
Initialize w randomly;
while !convergence do
Pick random x ∈ P ∪ N ;
if x ∈ P and w.x < 0 then
w = w + x ;
end
if x ∈ N and w.x ≥ 0 then
w = w − x ;
end
end
//the algorithm converges when all the
inputs are classified correctly
cosα =
wT x
||w||||x||
For x ∈ N if w.x ≥ 0 then it
means that the angle (α) between
this x and the current w is less
than 90◦
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
39/1
Algorithm: Perceptron Learning Algorithm
P ← inputs with label 1;
N ← inputs with label 0;
Initialize w randomly;
while !convergence do
Pick random x ∈ P ∪ N ;
if x ∈ P and w.x < 0 then
w = w + x ;
end
if x ∈ N and w.x ≥ 0 then
w = w − x ;
end
end
//the algorithm converges when all the
inputs are classified correctly
cosα =
wT x
||w||||x||
For x ∈ N if w.x ≥ 0 then it
means that the angle (α) between
this x and the current w is less
than 90◦ (but we want α to be
greater than 90◦)
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
39/1
Algorithm: Perceptron Learning Algorithm
P ← inputs with label 1;
N ← inputs with label 0;
Initialize w randomly;
while !convergence do
Pick random x ∈ P ∪ N ;
if x ∈ P and w.x < 0 then
w = w + x ;
end
if x ∈ N and w.x ≥ 0 then
w = w − x ;
end
end
//the algorithm converges when all the
inputs are classified correctly
cosα =
wT x
||w||||x||
For x ∈ N if w.x ≥ 0 then it
means that the angle (α) between
this x and the current w is less
than 90◦ (but we want α to be
greater than 90◦)
What happens to the new angle
(αnew) when wnew = w − x
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
39/1
Algorithm: Perceptron Learning Algorithm
P ← inputs with label 1;
N ← inputs with label 0;
Initialize w randomly;
while !convergence do
Pick random x ∈ P ∪ N ;
if x ∈ P and w.x < 0 then
w = w + x ;
end
if x ∈ N and w.x ≥ 0 then
w = w − x ;
end
end
//the algorithm converges when all the
inputs are classified correctly
cosα =
wT x
||w||||x||
For x ∈ N if w.x ≥ 0 then it
means that the angle (α) between
this x and the current w is less
than 90◦ (but we want α to be
greater than 90◦)
What happens to the new angle
(αnew) when wnew = w − x
cos(αnew) ∝ wnew
T
x
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
39/1
Algorithm: Perceptron Learning Algorithm
P ← inputs with label 1;
N ← inputs with label 0;
Initialize w randomly;
while !convergence do
Pick random x ∈ P ∪ N ;
if x ∈ P and w.x < 0 then
w = w + x ;
end
if x ∈ N and w.x ≥ 0 then
w = w − x ;
end
end
//the algorithm converges when all the
inputs are classified correctly
cosα =
wT x
||w||||x||
For x ∈ N if w.x ≥ 0 then it
means that the angle (α) between
this x and the current w is less
than 90◦ (but we want α to be
greater than 90◦)
What happens to the new angle
(αnew) when wnew = w − x
cos(αnew) ∝ wnew
T
x
∝ (w − x)T
x
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
39/1
Algorithm: Perceptron Learning Algorithm
P ← inputs with label 1;
N ← inputs with label 0;
Initialize w randomly;
while !convergence do
Pick random x ∈ P ∪ N ;
if x ∈ P and w.x < 0 then
w = w + x ;
end
if x ∈ N and w.x ≥ 0 then
w = w − x ;
end
end
//the algorithm converges when all the
inputs are classified correctly
cosα =
wT x
||w||||x||
For x ∈ N if w.x ≥ 0 then it
means that the angle (α) between
this x and the current w is less
than 90◦ (but we want α to be
greater than 90◦)
What happens to the new angle
(αnew) when wnew = w − x
cos(αnew) ∝ wnew
T
x
∝ (w − x)T
x
∝ wT
x − xT
x
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
39/1
Algorithm: Perceptron Learning Algorithm
P ← inputs with label 1;
N ← inputs with label 0;
Initialize w randomly;
while !convergence do
Pick random x ∈ P ∪ N ;
if x ∈ P and w.x < 0 then
w = w + x ;
end
if x ∈ N and w.x ≥ 0 then
w = w − x ;
end
end
//the algorithm converges when all the
inputs are classified correctly
cosα =
wT x
||w||||x||
For x ∈ N if w.x ≥ 0 then it
means that the angle (α) between
this x and the current w is less
than 90◦ (but we want α to be
greater than 90◦)
What happens to the new angle
(αnew) when wnew = w − x
cos(αnew) ∝ wnew
T
x
∝ (w − x)T
x
∝ wT
x − xT
x
∝ cosα − xT
x
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
39/1
Algorithm: Perceptron Learning Algorithm
P ← inputs with label 1;
N ← inputs with label 0;
Initialize w randomly;
while !convergence do
Pick random x ∈ P ∪ N ;
if x ∈ P and w.x < 0 then
w = w + x ;
end
if x ∈ N and w.x ≥ 0 then
w = w − x ;
end
end
//the algorithm converges when all the
inputs are classified correctly
cosα =
wT x
||w||||x||
For x ∈ N if w.x ≥ 0 then it
means that the angle (α) between
this x and the current w is less
than 90◦ (but we want α to be
greater than 90◦)
What happens to the new angle
(αnew) when wnew = w − x
cos(αnew) ∝ wnew
T
x
∝ (w − x)T
x
∝ wT
x − xT
x
∝ cosα − xT
x
cos(αnew) < cosα
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
39/1
Algorithm: Perceptron Learning Algorithm
P ← inputs with label 1;
N ← inputs with label 0;
Initialize w randomly;
while !convergence do
Pick random x ∈ P ∪ N ;
if x ∈ P and w.x < 0 then
w = w + x ;
end
if x ∈ N and w.x ≥ 0 then
w = w − x ;
end
end
//the algorithm converges when all the
inputs are classified correctly
cosα =
wT x
||w||||x||
For x ∈ N if w.x ≥ 0 then it
means that the angle (α) between
this x and the current w is less
than 90◦ (but we want α to be
greater than 90◦)
What happens to the new angle
(αnew) when wnew = w − x
cos(αnew) ∝ wnew
T
x
∝ (w − x)T
x
∝ wT
x − xT
x
∝ cosα − xT
x
cos(αnew) < cosα
Thus αnew will be greater than α
and this is exactly what we want
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
40/1
We will now see this algorithm in action for a toy dataset
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
41/1
x1
x2
p1
p2
p3
n1
n2 n3
We initialized w to a random value
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
41/1
x1
x2
p1
p2
p3
n1
n2 n3
We initialized w to a random value
We observe that currently, w · x < 0 (∵ angle
> 90◦) for all the positive points and w·x ≥ 0
(∵ angle < 90◦) for all the negative points
(the situation is exactly oppsite of what we
actually want it to be)
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
41/1
x1
x2
p1
p2
p3
n1
n2 n3
We initialized w to a random value
We observe that currently, w · x < 0 (∵ angle
> 90◦) for all the positive points and w·x ≥ 0
(∵ angle < 90◦) for all the negative points
(the situation is exactly oppsite of what we
actually want it to be)
We now run the algorithm by randomly going
over the points
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
41/1
x1
x2
p1
p2
p3
n1
n2 n3
We initialized w to a random value
We observe that currently, w · x < 0 (∵ angle
> 90◦) for all the positive points and w·x ≥ 0
(∵ angle < 90◦) for all the negative points
(the situation is exactly oppsite of what we
actually want it to be)
We now run the algorithm by randomly going
over the points
Randomly pick a point (say, p1), apply correc-
tion w = w + x ∵ w · x < 0 (you can check
the angle visually)
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
41/1
x1
x2
p1
p2
p3
n1
n2 n3
We initialized w to a random value
We observe that currently, w · x < 0 (∵ angle
> 90◦) for all the positive points and w·x ≥ 0
(∵ angle < 90◦) for all the negative points
(the situation is exactly oppsite of what we
actually want it to be)
We now run the algorithm by randomly going
over the points
Randomly pick a point (say, p1), apply correc-
tion w = w + x ∵ w · x < 0 (you can check
the angle visually)
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
41/1
x1
x2
p1
p2
p3
n1
n2 n3
We initialized w to a random value
We observe that currently, w · x < 0 (∵ angle
> 90◦) for all the positive points and w·x ≥ 0
(∵ angle < 90◦) for all the negative points
(the situation is exactly oppsite of what we
actually want it to be)
We now run the algorithm by randomly going
over the points
Randomly pick a point (say, p2), apply correc-
tion w = w + x ∵ w · x < 0 (you can check
the angle visually)
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
41/1
x1
x2
p1
p2
p3
n1
n2 n3
We initialized w to a random value
We observe that currently, w · x < 0 (∵ angle
> 90◦) for all the positive points and w·x ≥ 0
(∵ angle < 90◦) for all the negative points
(the situation is exactly oppsite of what we
actually want it to be)
We now run the algorithm by randomly going
over the points
Randomly pick a point (say, p2), apply correc-
tion w = w + x ∵ w · x < 0 (you can check
the angle visually)
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
41/1
x1
x2
p1
p2
p3
n1
n2 n3
We initialized w to a random value
We observe that currently, w · x < 0 (∵ angle
> 90◦) for all the positive points and w·x ≥ 0
(∵ angle < 90◦) for all the negative points
(the situation is exactly oppsite of what we
actually want it to be)
We now run the algorithm by randomly going
over the points
Randomly pick a point (say, n1), apply correc-
tion w = w − x ∵ w · x ≥ 0 (you can check
the angle visually)
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
41/1
x1
x2
p1
p2
p3
n1
n2 n3
We initialized w to a random value
We observe that currently, w · x < 0 (∵ angle
> 90◦) for all the positive points and w·x ≥ 0
(∵ angle < 90◦) for all the negative points
(the situation is exactly oppsite of what we
actually want it to be)
We now run the algorithm by randomly going
over the points
Randomly pick a point (say, n1), apply correc-
tion w = w − x ∵ w · x ≥ 0 (you can check
the angle visually)
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
41/1
x1
x2
p1
p2
p3
n1
n2 n3
We initialized w to a random value
We observe that currently, w · x < 0 (∵ angle
> 90◦) for all the positive points and w·x ≥ 0
(∵ angle < 90◦) for all the negative points
(the situation is exactly oppsite of what we
actually want it to be)
We now run the algorithm by randomly going
over the points
Randomly pick a point (say, n3), no correction
needed ∵ w · x < 0 (you can check the angle
visually)
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
41/1
x1
x2
p1
p2
p3
n1
n2 n3
We initialized w to a random value
We observe that currently, w · x < 0 (∵ angle
> 90◦) for all the positive points and w·x ≥ 0
(∵ angle < 90◦) for all the negative points
(the situation is exactly oppsite of what we
actually want it to be)
We now run the algorithm by randomly going
over the points
Randomly pick a point (say, n3), no correction
needed ∵ w · x < 0 (you can check the angle
visually)
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
41/1
x1
x2
p1
p2
p3
n1
n2 n3
We initialized w to a random value
We observe that currently, w · x < 0 (∵ angle
> 90◦) for all the positive points and w·x ≥ 0
(∵ angle < 90◦) for all the negative points
(the situation is exactly oppsite of what we
actually want it to be)
We now run the algorithm by randomly going
over the points
Randomly pick a point (say, n2), no correction
needed ∵ w · x < 0 (you can check the angle
visually)
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
41/1
x1
x2
p1
p2
p3
n1
n2 n3
We initialized w to a random value
We observe that currently, w · x < 0 (∵ angle
> 90◦) for all the positive points and w·x ≥ 0
(∵ angle < 90◦) for all the negative points
(the situation is exactly oppsite of what we
actually want it to be)
We now run the algorithm by randomly going
over the points
Randomly pick a point (say, n2), no correction
needed ∵ w · x < 0 (you can check the angle
visually)
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
41/1
x1
x2
p1
p2
p3
n1
n2 n3
We initialized w to a random value
We observe that currently, w · x < 0 (∵ angle
> 90◦) for all the positive points and w·x ≥ 0
(∵ angle < 90◦) for all the negative points
(the situation is exactly oppsite of what we
actually want it to be)
We now run the algorithm by randomly going
over the points
Randomly pick a point (say, p3), apply correc-
tion w = w + x ∵ w · x < 0 (you can check
the angle visually)
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
41/1
x1
x2
p1
p2
p3
n1
n2 n3
We initialized w to a random value
We observe that currently, w · x < 0 (∵ angle
> 90◦) for all the positive points and w·x ≥ 0
(∵ angle < 90◦) for all the negative points
(the situation is exactly oppsite of what we
actually want it to be)
We now run the algorithm by randomly going
over the points
Randomly pick a point (say, p3), apply correc-
tion w = w + x ∵ w · x < 0 (you can check
the angle visually)
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
41/1
x1
x2
p1
p2
p3
n1
n2 n3
We initialized w to a random value
We observe that currently, w · x < 0 (∵ angle
> 90◦) for all the positive points and w·x ≥ 0
(∵ angle < 90◦) for all the negative points
(the situation is exactly oppsite of what we
actually want it to be)
We now run the algorithm by randomly going
over the points
Randomly pick a point (say, p1), no correction
needed ∵ w · x ≥ 0 (you can check the angle
visually)
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
41/1
x1
x2
p1
p2
p3
n1
n2 n3
We initialized w to a random value
We observe that currently, w · x < 0 (∵ angle
> 90◦) for all the positive points and w·x ≥ 0
(∵ angle < 90◦) for all the negative points
(the situation is exactly oppsite of what we
actually want it to be)
We now run the algorithm by randomly going
over the points
Randomly pick a point (say, p1), no correction
needed ∵ w · x ≥ 0 (you can check the angle
visually)
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
41/1
x1
x2
p1
p2
p3
n1
n2 n3
We initialized w to a random value
We observe that currently, w · x < 0 (∵ angle
> 90◦) for all the positive points and w·x ≥ 0
(∵ angle < 90◦) for all the negative points
(the situation is exactly oppsite of what we
actually want it to be)
We now run the algorithm by randomly going
over the points
Randomly pick a point (say, p2), no correction
needed ∵ w · x ≥ 0 (you can check the angle
visually)
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
41/1
x1
x2
p1
p2
p3
n1
n2 n3
We initialized w to a random value
We observe that currently, w · x < 0 (∵ angle
> 90◦) for all the positive points and w·x ≥ 0
(∵ angle < 90◦) for all the negative points
(the situation is exactly oppsite of what we
actually want it to be)
We now run the algorithm by randomly going
over the points
Randomly pick a point (say, p2), no correction
needed ∵ w · x ≥ 0 (you can check the angle
visually)
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
41/1
x1
x2
p1
p2
p3
n1
n2 n3
We initialized w to a random value
We observe that currently, w · x < 0 (∵ angle
> 90◦) for all the positive points and w·x ≥ 0
(∵ angle < 90◦) for all the negative points
(the situation is exactly oppsite of what we
actually want it to be)
We now run the algorithm by randomly going
over the points
Randomly pick a point (say, n1), no correction
needed ∵ w · x < 0 (you can check the angle
visually)
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
41/1
x1
x2
p1
p2
p3
n1
n2 n3
We initialized w to a random value
We observe that currently, w · x < 0 (∵ angle
> 90◦) for all the positive points and w·x ≥ 0
(∵ angle < 90◦) for all the negative points
(the situation is exactly oppsite of what we
actually want it to be)
We now run the algorithm by randomly going
over the points
Randomly pick a point (say, n1), no correction
needed ∵ w · x < 0 (you can check the angle
visually)
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
41/1
x1
x2
p1
p2
p3
n1
n2 n3
We initialized w to a random value
We observe that currently, w · x < 0 (∵ angle
> 90◦) for all the positive points and w·x ≥ 0
(∵ angle < 90◦) for all the negative points
(the situation is exactly oppsite of what we
actually want it to be)
We now run the algorithm by randomly going
over the points
Randomly pick a point (say, n3), no correction
needed ∵ w · x < 0 (you can check the angle
visually)
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
41/1
x1
x2
p1
p2
p3
n1
n2 n3
We initialized w to a random value
We observe that currently, w · x < 0 (∵ angle
> 90◦) for all the positive points and w·x ≥ 0
(∵ angle < 90◦) for all the negative points
(the situation is exactly oppsite of what we
actually want it to be)
We now run the algorithm by randomly going
over the points
Randomly pick a point (say, n3), no correction
needed ∵ w · x < 0 (you can check the angle
visually)
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
41/1
x1
x2
p1
p2
p3
n1
n2 n3
We initialized w to a random value
We observe that currently, w · x < 0 (∵ angle
> 90◦) for all the positive points and w·x ≥ 0
(∵ angle < 90◦) for all the negative points
(the situation is exactly oppsite of what we
actually want it to be)
We now run the algorithm by randomly going
over the points
Randomly pick a point (say, n2), no correction
needed ∵ w · x < 0 (you can check the angle
visually)
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
41/1
x1
x2
p1
p2
p3
n1
n2 n3
We initialized w to a random value
We observe that currently, w · x < 0 (∵ angle
> 90◦) for all the positive points and w·x ≥ 0
(∵ angle < 90◦) for all the negative points
(the situation is exactly oppsite of what we
actually want it to be)
We now run the algorithm by randomly going
over the points
Randomly pick a point (say, n2), no correction
needed ∵ w · x < 0 (you can check the angle
visually)
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
41/1
x1
x2
p1
p2
p3
n1
n2 n3
We initialized w to a random value
We observe that currently, w · x < 0 (∵ angle
> 90◦) for all the positive points and w·x ≥ 0
(∵ angle < 90◦) for all the negative points
(the situation is exactly oppsite of what we
actually want it to be)
We now run the algorithm by randomly going
over the points
Randomly pick a point (say, p3), no correction
needed ∵ w · x ≥ 0 (you can check the angle
visually)
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
41/1
x1
x2
p1
p2
p3
n1
n2 n3
We initialized w to a random value
We observe that currently, w · x < 0 (∵ angle
> 90◦) for all the positive points and w·x ≥ 0
(∵ angle < 90◦) for all the negative points
(the situation is exactly oppsite of what we
actually want it to be)
We now run the algorithm by randomly going
over the points
The algorithm has converged
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
42/1
Module 2.6: Proof of Convergence
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
43/1
Now that we have some faith and intuition about why the algorithm works, we
will see a more formal proof of convergence ...
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
44/1
Theorem
Definition: Two sets P and N of points in an n-dimensional space are called
absolutely linearly separable if
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
44/1
Theorem
Definition: Two sets P and N of points in an n-dimensional space are called
absolutely linearly separable if n + 1 real numbers w0, w1, ..., wn exist
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
44/1
Theorem
Definition: Two sets P and N of points in an n-dimensional space are called
absolutely linearly separable if n + 1 real numbers w0, w1, ..., wn exist such that
every point (x1, x2, ..., xn) ∈ P satisfies
Pn
i=1 wi ∗ xi > w0
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
44/1
Theorem
Definition: Two sets P and N of points in an n-dimensional space are called
absolutely linearly separable if n + 1 real numbers w0, w1, ..., wn exist such that
every point (x1, x2, ..., xn) ∈ P satisfies
Pn
i=1 wi ∗ xi > w0 and every point
(x1, x2, ..., xn) ∈ N satisfies
Pn
i=1 wi ∗ xi < w0.
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
44/1
Theorem
Definition: Two sets P and N of points in an n-dimensional space are called
absolutely linearly separable if n + 1 real numbers w0, w1, ..., wn exist such that
every point (x1, x2, ..., xn) ∈ P satisfies
Pn
i=1 wi ∗ xi > w0 and every point
(x1, x2, ..., xn) ∈ N satisfies
Pn
i=1 wi ∗ xi < w0.
Proposition:
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
44/1
Theorem
Definition: Two sets P and N of points in an n-dimensional space are called
absolutely linearly separable if n + 1 real numbers w0, w1, ..., wn exist such that
every point (x1, x2, ..., xn) ∈ P satisfies
Pn
i=1 wi ∗ xi > w0 and every point
(x1, x2, ..., xn) ∈ N satisfies
Pn
i=1 wi ∗ xi < w0.
Proposition: If the sets P and N are finite and linearly separable,
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
44/1
Theorem
Definition: Two sets P and N of points in an n-dimensional space are called
absolutely linearly separable if n + 1 real numbers w0, w1, ..., wn exist such that
every point (x1, x2, ..., xn) ∈ P satisfies
Pn
i=1 wi ∗ xi > w0 and every point
(x1, x2, ..., xn) ∈ N satisfies
Pn
i=1 wi ∗ xi < w0.
Proposition: If the sets P and N are finite and linearly separable, the perceptron
learning algorithm updates the weight vector wt a finite number of times.
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
44/1
Theorem
Definition: Two sets P and N of points in an n-dimensional space are called
absolutely linearly separable if n + 1 real numbers w0, w1, ..., wn exist such that
every point (x1, x2, ..., xn) ∈ P satisfies
Pn
i=1 wi ∗ xi > w0 and every point
(x1, x2, ..., xn) ∈ N satisfies
Pn
i=1 wi ∗ xi < w0.
Proposition: If the sets P and N are finite and linearly separable, the perceptron
learning algorithm updates the weight vector wt a finite number of times. In other
words: if the vectors in P and N are tested cyclically one after the other,
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
44/1
Theorem
Definition: Two sets P and N of points in an n-dimensional space are called
absolutely linearly separable if n + 1 real numbers w0, w1, ..., wn exist such that
every point (x1, x2, ..., xn) ∈ P satisfies
Pn
i=1 wi ∗ xi > w0 and every point
(x1, x2, ..., xn) ∈ N satisfies
Pn
i=1 wi ∗ xi < w0.
Proposition: If the sets P and N are finite and linearly separable, the perceptron
learning algorithm updates the weight vector wt a finite number of times. In other
words: if the vectors in P and N are tested cyclically one after the other, a weight
vector wt is found after a finite number of steps t which can separate the two sets.
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
44/1
Theorem
Definition: Two sets P and N of points in an n-dimensional space are called
absolutely linearly separable if n + 1 real numbers w0, w1, ..., wn exist such that
every point (x1, x2, ..., xn) ∈ P satisfies
Pn
i=1 wi ∗ xi > w0 and every point
(x1, x2, ..., xn) ∈ N satisfies
Pn
i=1 wi ∗ xi < w0.
Proposition: If the sets P and N are finite and linearly separable, the perceptron
learning algorithm updates the weight vector wt a finite number of times. In other
words: if the vectors in P and N are tested cyclically one after the other, a weight
vector wt is found after a finite number of steps t which can separate the two sets.
Proof: On the next slide
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
45/1
Setup:
If x ∈ N then -x ∈ P (∵
wT x < 0 =⇒ wT (−x) ≥ 0)
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
45/1
Setup:
If x ∈ N then -x ∈ P (∵
wT x < 0 =⇒ wT (−x) ≥ 0)
We can thus consider a single
set P0 = P ∪ N− and for
every element p ∈ P0 ensure
that wT p ≥ 0
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
45/1
Setup:
If x ∈ N then -x ∈ P (∵
wT x < 0 =⇒ wT (−x) ≥ 0)
We can thus consider a single
set P0 = P ∪ N− and for
every element p ∈ P0 ensure
that wT p ≥ 0
Algorithm: Perceptron Learning Algorithm
P ← inputs with label 1;
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
45/1
Setup:
If x ∈ N then -x ∈ P (∵
wT x < 0 =⇒ wT (−x) ≥ 0)
We can thus consider a single
set P0 = P ∪ N− and for
every element p ∈ P0 ensure
that wT p ≥ 0
Algorithm: Perceptron Learning Algorithm
P ← inputs with label 1;
N ← inputs with label 0;
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
45/1
Setup:
If x ∈ N then -x ∈ P (∵
wT x < 0 =⇒ wT (−x) ≥ 0)
We can thus consider a single
set P0 = P ∪ N− and for
every element p ∈ P0 ensure
that wT p ≥ 0
Algorithm: Perceptron Learning Algorithm
P ← inputs with label 1;
N ← inputs with label 0;
N−
contains negations of all points in N;
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
45/1
Setup:
If x ∈ N then -x ∈ P (∵
wT x < 0 =⇒ wT (−x) ≥ 0)
We can thus consider a single
set P0 = P ∪ N− and for
every element p ∈ P0 ensure
that wT p ≥ 0
Algorithm: Perceptron Learning Algorithm
P ← inputs with label 1;
N ← inputs with label 0;
N−
contains negations of all points in N;
P0
← P ∪ N−
;
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
45/1
Setup:
If x ∈ N then -x ∈ P (∵
wT x < 0 =⇒ wT (−x) ≥ 0)
We can thus consider a single
set P0 = P ∪ N− and for
every element p ∈ P0 ensure
that wT p ≥ 0
Algorithm: Perceptron Learning Algorithm
P ← inputs with label 1;
N ← inputs with label 0;
N−
contains negations of all points in N;
P0
← P ∪ N−
;
Initialize w randomly;
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
45/1
Setup:
If x ∈ N then -x ∈ P (∵
wT x < 0 =⇒ wT (−x) ≥ 0)
We can thus consider a single
set P0 = P ∪ N− and for
every element p ∈ P0 ensure
that wT p ≥ 0
Algorithm: Perceptron Learning Algorithm
P ← inputs with label 1;
N ← inputs with label 0;
N−
contains negations of all points in N;
P0
← P ∪ N−
;
Initialize w randomly;
while !convergence do
end
//the algorithm converges when all the inputs are
classified correctly
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
45/1
Setup:
If x ∈ N then -x ∈ P (∵
wT x < 0 =⇒ wT (−x) ≥ 0)
We can thus consider a single
set P0 = P ∪ N− and for
every element p ∈ P0 ensure
that wT p ≥ 0
Algorithm: Perceptron Learning Algorithm
P ← inputs with label 1;
N ← inputs with label 0;
N−
contains negations of all points in N;
P0
← P ∪ N−
;
Initialize w randomly;
while !convergence do
Pick random p ∈ P0
;
end
//the algorithm converges when all the inputs are
classified correctly
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
45/1
Setup:
If x ∈ N then -x ∈ P (∵
wT x < 0 =⇒ wT (−x) ≥ 0)
We can thus consider a single
set P0 = P ∪ N− and for
every element p ∈ P0 ensure
that wT p ≥ 0
Algorithm: Perceptron Learning Algorithm
P ← inputs with label 1;
N ← inputs with label 0;
N−
contains negations of all points in N;
P0
← P ∪ N−
;
Initialize w randomly;
while !convergence do
Pick random p ∈ P0
;
if w.p < 0 then
end
end
//the algorithm converges when all the inputs are
classified correctly
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
45/1
Setup:
If x ∈ N then -x ∈ P (∵
wT x < 0 =⇒ wT (−x) ≥ 0)
We can thus consider a single
set P0 = P ∪ N− and for
every element p ∈ P0 ensure
that wT p ≥ 0
Algorithm: Perceptron Learning Algorithm
P ← inputs with label 1;
N ← inputs with label 0;
N−
contains negations of all points in N;
P0
← P ∪ N−
;
Initialize w randomly;
while !convergence do
Pick random p ∈ P0
;
if w.p < 0 then
w = w + p ;
end
end
//the algorithm converges when all the inputs are
classified correctly
//notice that we do not need the other if condition
because by construction we want all points in P0
to
lie in the positive half space w.p ≥ 0
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
45/1
Setup:
If x ∈ N then -x ∈ P (∵
wT x < 0 =⇒ wT (−x) ≥ 0)
We can thus consider a single
set P0 = P ∪ N− and for
every element p ∈ P0 ensure
that wT p ≥ 0
Further we will normalize all
the p’s so that ||p|| = 1 (no-
tice that this does not affect
the solution ∵ if wT p
||p|| ≥
0 then wT p ≥ 0)
Algorithm: Perceptron Learning Algorithm
P ← inputs with label 1;
N ← inputs with label 0;
N−
contains negations of all points in N;
P0
← P ∪ N−
;
Initialize w randomly;
while !convergence do
Pick random p ∈ P0
;
if w.p < 0 then
w = w + p ;
end
end
//the algorithm converges when all the inputs are
classified correctly
//notice that we do not need the other if condition
because by construction we want all points in P0
to
lie in the positive half space w.p ≥ 0
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
45/1
Setup:
If x ∈ N then -x ∈ P (∵
wT x < 0 =⇒ wT (−x) ≥ 0)
We can thus consider a single
set P0 = P ∪ N− and for
every element p ∈ P0 ensure
that wT p ≥ 0
Further we will normalize all
the p’s so that ||p|| = 1 (no-
tice that this does not affect
the solution ∵ if wT p
||p|| ≥
0 then wT p ≥ 0)
Algorithm: Perceptron Learning Algorithm
P ← inputs with label 1;
N ← inputs with label 0;
N−
contains negations of all points in N;
P0
← P ∪ N−
;
Initialize w randomly;
while !convergence do
Pick random p ∈ P0
;
p ← p
||p|| (so now,||p|| = 1) ;
if w.p < 0 then
w = w + p ;
end
end
//the algorithm converges when all the inputs are
classified correctly
//notice that we do not need the other if condition
because by construction we want all points in P0
to
lie in the positive half space w.p ≥ 0
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
45/1
Setup:
If x ∈ N then -x ∈ P (∵
wT x < 0 =⇒ wT (−x) ≥ 0)
We can thus consider a single
set P0 = P ∪ N− and for
every element p ∈ P0 ensure
that wT p ≥ 0
Further we will normalize all
the p’s so that ||p|| = 1 (no-
tice that this does not affect
the solution ∵ if wT p
||p|| ≥
0 then wT p ≥ 0)
Let w∗ be the normalized
solution vector (we know one
exists as the data is linearly
separable)
Algorithm: Perceptron Learning Algorithm
P ← inputs with label 1;
N ← inputs with label 0;
N−
contains negations of all points in N;
P0
← P ∪ N−
;
Initialize w randomly;
while !convergence do
Pick random p ∈ P0
;
p ← p
||p|| (so now,||p|| = 1) ;
if w.p < 0 then
w = w + p ;
end
end
//the algorithm converges when all the inputs are
classified correctly
//notice that we do not need the other if condition
because by construction we want all points in P0
to
lie in the positive half space w.p ≥ 0
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
46/1
Observations:
w∗ is some optimal solution
which exists but we don’t know
what it is
Proof:
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
46/1
Observations:
w∗ is some optimal solution
which exists but we don’t know
what it is
Proof:
Now suppose at time step t we inspected the
point pi and found that wT · pi ≤ 0
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
46/1
Observations:
w∗ is some optimal solution
which exists but we don’t know
what it is
Proof:
Now suppose at time step t we inspected the
point pi and found that wT · pi ≤ 0
We make a correction wt+1 = wt + pi
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
46/1
Observations:
w∗ is some optimal solution
which exists but we don’t know
what it is
Proof:
Now suppose at time step t we inspected the
point pi and found that wT · pi ≤ 0
We make a correction wt+1 = wt + pi
Let β be the angle between w∗ and wt+1
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
46/1
Observations:
w∗ is some optimal solution
which exists but we don’t know
what it is
Proof:
Now suppose at time step t we inspected the
point pi and found that wT · pi ≤ 0
We make a correction wt+1 = wt + pi
Let β be the angle between w∗ and wt+1
cosβ =
w∗ · wt+1
||wt+1||
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
46/1
Observations:
w∗ is some optimal solution
which exists but we don’t know
what it is
Proof:
Now suppose at time step t we inspected the
point pi and found that wT · pi ≤ 0
We make a correction wt+1 = wt + pi
Let β be the angle between w∗ and wt+1
cosβ =
w∗ · wt+1
||wt+1||
Numerator = w∗
· wt+1
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
46/1
Observations:
w∗ is some optimal solution
which exists but we don’t know
what it is
Proof:
Now suppose at time step t we inspected the
point pi and found that wT · pi ≤ 0
We make a correction wt+1 = wt + pi
Let β be the angle between w∗ and wt+1
cosβ =
w∗ · wt+1
||wt+1||
Numerator = w∗
· wt+1 = w∗
· (wt + pi)
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
46/1
Observations:
w∗ is some optimal solution
which exists but we don’t know
what it is
Proof:
Now suppose at time step t we inspected the
point pi and found that wT · pi ≤ 0
We make a correction wt+1 = wt + pi
Let β be the angle between w∗ and wt+1
cosβ =
w∗ · wt+1
||wt+1||
Numerator = w∗
· wt+1 = w∗
· (wt + pi)
= w∗
· wt + w∗
· pi
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
46/1
Observations:
w∗ is some optimal solution
which exists but we don’t know
what it is
Proof:
Now suppose at time step t we inspected the
point pi and found that wT · pi ≤ 0
We make a correction wt+1 = wt + pi
Let β be the angle between w∗ and wt+1
cosβ =
w∗ · wt+1
||wt+1||
Numerator = w∗
· wt+1 = w∗
· (wt + pi)
= w∗
· wt + w∗
· pi
≥ w∗
· wt + δ (δ = min{w∗
· pi|∀i}
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
46/1
Observations:
w∗ is some optimal solution
which exists but we don’t know
what it is
Proof:
Now suppose at time step t we inspected the
point pi and found that wT · pi ≤ 0
We make a correction wt+1 = wt + pi
Let β be the angle between w∗ and wt+1
cosβ =
w∗ · wt+1
||wt+1||
Numerator = w∗
· wt+1 = w∗
· (wt + pi)
= w∗
· wt + w∗
· pi
≥ w∗
· wt + δ (δ = min{w∗
· pi|∀i}
≥ w∗
· (wt−1 + pj) + δ
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
46/1
Observations:
w∗ is some optimal solution
which exists but we don’t know
what it is
Proof:
Now suppose at time step t we inspected the
point pi and found that wT · pi ≤ 0
We make a correction wt+1 = wt + pi
Let β be the angle between w∗ and wt+1
cosβ =
w∗ · wt+1
||wt+1||
Numerator = w∗
· wt+1 = w∗
· (wt + pi)
= w∗
· wt + w∗
· pi
≥ w∗
· wt + δ (δ = min{w∗
· pi|∀i}
≥ w∗
· (wt−1 + pj) + δ
≥ w∗
· wt−1 + w∗
· pj + δ
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
46/1
Observations:
w∗ is some optimal solution
which exists but we don’t know
what it is
Proof:
Now suppose at time step t we inspected the
point pi and found that wT · pi ≤ 0
We make a correction wt+1 = wt + pi
Let β be the angle between w∗ and wt+1
cosβ =
w∗ · wt+1
||wt+1||
Numerator = w∗
· wt+1 = w∗
· (wt + pi)
= w∗
· wt + w∗
· pi
≥ w∗
· wt + δ (δ = min{w∗
· pi|∀i}
≥ w∗
· (wt−1 + pj) + δ
≥ w∗
· wt−1 + w∗
· pj + δ
≥ w∗
· wt−1 + 2δ
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
46/1
Observations:
w∗ is some optimal solution
which exists but we don’t know
what it is
We do not make a correction
at every time-step
Proof:
Now suppose at time step t we inspected the
point pi and found that wT · pi ≤ 0
We make a correction wt+1 = wt + pi
Let β be the angle between w∗ and wt+1
cosβ =
w∗ · wt+1
||wt+1||
Numerator = w∗
· wt+1 = w∗
· (wt + pi)
= w∗
· wt + w∗
· pi
≥ w∗
· wt + δ (δ = min{w∗
· pi|∀i}
≥ w∗
· (wt−1 + pj) + δ
≥ w∗
· wt−1 + w∗
· pj + δ
≥ w∗
· wt−1 + 2δ
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
46/1
Observations:
w∗ is some optimal solution
which exists but we don’t know
what it is
We do not make a correction
at every time-step
We make a correction only if
wT · pi ≤ 0 at that time step
Proof:
Now suppose at time step t we inspected the
point pi and found that wT · pi ≤ 0
We make a correction wt+1 = wt + pi
Let β be the angle between w∗ and wt+1
cosβ =
w∗ · wt+1
||wt+1||
Numerator = w∗
· wt+1 = w∗
· (wt + pi)
= w∗
· wt + w∗
· pi
≥ w∗
· wt + δ (δ = min{w∗
· pi|∀i}
≥ w∗
· (wt−1 + pj) + δ
≥ w∗
· wt−1 + w∗
· pj + δ
≥ w∗
· wt−1 + 2δ
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
46/1
Observations:
w∗ is some optimal solution
which exists but we don’t know
what it is
We do not make a correction
at every time-step
We make a correction only if
wT · pi ≤ 0 at that time step
So at time-step t we would
have made only k (≤ t) cor-
rections
Proof:
Now suppose at time step t we inspected the
point pi and found that wT · pi ≤ 0
We make a correction wt+1 = wt + pi
Let β be the angle between w∗ and wt+1
cosβ =
w∗ · wt+1
||wt+1||
Numerator = w∗
· wt+1 = w∗
· (wt + pi)
= w∗
· wt + w∗
· pi
≥ w∗
· wt + δ (δ = min{w∗
· pi|∀i}
≥ w∗
· (wt−1 + pj) + δ
≥ w∗
· wt−1 + w∗
· pj + δ
≥ w∗
· wt−1 + 2δ
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
46/1
Observations:
w∗ is some optimal solution
which exists but we don’t know
what it is
We do not make a correction
at every time-step
We make a correction only if
wT · pi ≤ 0 at that time step
So at time-step t we would
have made only k (≤ t) cor-
rections
Every time we make a correc-
tion a quantity δ gets added to
the numerator
Proof:
Now suppose at time step t we inspected the
point pi and found that wT · pi ≤ 0
We make a correction wt+1 = wt + pi
Let β be the angle between w∗ and wt+1
cosβ =
w∗ · wt+1
||wt+1||
Numerator = w∗
· wt+1 = w∗
· (wt + pi)
= w∗
· wt + w∗
· pi
≥ w∗
· wt + δ (δ = min{w∗
· pi|∀i}
≥ w∗
· (wt−1 + pj) + δ
≥ w∗
· wt−1 + w∗
· pj + δ
≥ w∗
· wt−1 + 2δ
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
46/1
Observations:
w∗ is some optimal solution
which exists but we don’t know
what it is
We do not make a correction
at every time-step
We make a correction only if
wT · pi ≤ 0 at that time step
So at time-step t we would
have made only k (≤ t) cor-
rections
Every time we make a correc-
tion a quantity δ gets added to
the numerator
So by time-step t, a quantity
kδ gets added to the numer-
ator
Proof:
Now suppose at time step t we inspected the
point pi and found that wT · pi ≤ 0
We make a correction wt+1 = wt + pi
Let β be the angle between w∗ and wt+1
cosβ =
w∗ · wt+1
||wt+1||
Numerator = w∗
· wt+1 = w∗
· (wt + pi)
= w∗
· wt + w∗
· pi
≥ w∗
· wt + δ (δ = min{w∗
· pi|∀i}
≥ w∗
· (wt−1 + pj) + δ
≥ w∗
· wt−1 + w∗
· pj + δ
≥ w∗
· wt−1 + 2δ
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
46/1
Observations:
w∗ is some optimal solution
which exists but we don’t know
what it is
We do not make a correction
at every time-step
We make a correction only if
wT · pi ≤ 0 at that time step
So at time-step t we would
have made only k (≤ t) cor-
rections
Every time we make a correc-
tion a quantity δ gets added to
the numerator
So by time-step t, a quantity
kδ gets added to the numer-
ator
Proof:
Now suppose at time step t we inspected the
point pi and found that wT · pi ≤ 0
We make a correction wt+1 = wt + pi
Let β be the angle between w∗ and wt+1
cosβ =
w∗ · wt+1
||wt+1||
Numerator = w∗
· wt+1 = w∗
· (wt + pi)
= w∗
· wt + w∗
· pi
≥ w∗
· wt + δ (δ = min{w∗
· pi|∀i}
≥ w∗
· (wt−1 + pj) + δ
≥ w∗
· wt−1 + w∗
· pj + δ
≥ w∗
· wt−1 + 2δ
≥ w∗
· w0 + (k)δ (By induction)
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
47/1
Proof (continued:)
So far we have, wT
· pi ≤ 0 (and hence we made the correction)
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
47/1
Proof (continued:)
So far we have, wT
· pi ≤ 0 (and hence we made the correction)
cosβ =
w∗ · wt+1
||wt+1||
(by definition)
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
47/1
Proof (continued:)
So far we have, wT
· pi ≤ 0 (and hence we made the correction)
cosβ =
w∗ · wt+1
||wt+1||
(by definition)
Numerator ≥ w∗
· w0 + kδ (proved by induction)
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
47/1
Proof (continued:)
So far we have, wT
· pi ≤ 0 (and hence we made the correction)
cosβ =
w∗ · wt+1
||wt+1||
(by definition)
Numerator ≥ w∗
· w0 + kδ (proved by induction)
Denominator2
= ||wt+1||2
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
47/1
Proof (continued:)
So far we have, wT
· pi ≤ 0 (and hence we made the correction)
cosβ =
w∗ · wt+1
||wt+1||
(by definition)
Numerator ≥ w∗
· w0 + kδ (proved by induction)
Denominator2
= ||wt+1||2
= (wt + pi) · (wt + pi)
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
47/1
Proof (continued:)
So far we have, wT
· pi ≤ 0 (and hence we made the correction)
cosβ =
w∗ · wt+1
||wt+1||
(by definition)
Numerator ≥ w∗
· w0 + kδ (proved by induction)
Denominator2
= ||wt+1||2
= (wt + pi) · (wt + pi)
= ||wt||2
+ 2wt · pi + ||pi||2
)
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
47/1
Proof (continued:)
So far we have, wT
· pi ≤ 0 (and hence we made the correction)
cosβ =
w∗ · wt+1
||wt+1||
(by definition)
Numerator ≥ w∗
· w0 + kδ (proved by induction)
Denominator2
= ||wt+1||2
= (wt + pi) · (wt + pi)
= ||wt||2
+ 2wt · pi + ||pi||2
)
≤ ||wt||2
+ ||pi||2
(∵ wt · pi ≤ 0)
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
47/1
Proof (continued:)
So far we have, wT
· pi ≤ 0 (and hence we made the correction)
cosβ =
w∗ · wt+1
||wt+1||
(by definition)
Numerator ≥ w∗
· w0 + kδ (proved by induction)
Denominator2
= ||wt+1||2
= (wt + pi) · (wt + pi)
= ||wt||2
+ 2wt · pi + ||pi||2
)
≤ ||wt||2
+ ||pi||2
(∵ wt · pi ≤ 0)
≤ ||wt||2
+ 1 (∵ ||pi||2
= 1)
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
47/1
Proof (continued:)
So far we have, wT
· pi ≤ 0 (and hence we made the correction)
cosβ =
w∗ · wt+1
||wt+1||
(by definition)
Numerator ≥ w∗
· w0 + kδ (proved by induction)
Denominator2
= ||wt+1||2
= (wt + pi) · (wt + pi)
= ||wt||2
+ 2wt · pi + ||pi||2
)
≤ ||wt||2
+ ||pi||2
(∵ wt · pi ≤ 0)
≤ ||wt||2
+ 1 (∵ ||pi||2
= 1)
≤ (||wt−1||2
+ 1) + 1
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
47/1
Proof (continued:)
So far we have, wT
· pi ≤ 0 (and hence we made the correction)
cosβ =
w∗ · wt+1
||wt+1||
(by definition)
Numerator ≥ w∗
· w0 + kδ (proved by induction)
Denominator2
= ||wt+1||2
= (wt + pi) · (wt + pi)
= ||wt||2
+ 2wt · pi + ||pi||2
)
≤ ||wt||2
+ ||pi||2
(∵ wt · pi ≤ 0)
≤ ||wt||2
+ 1 (∵ ||pi||2
= 1)
≤ (||wt−1||2
+ 1) + 1
≤ ||wt−1||2
+ 2
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
47/1
Proof (continued:)
So far we have, wT
· pi ≤ 0 (and hence we made the correction)
cosβ =
w∗ · wt+1
||wt+1||
(by definition)
Numerator ≥ w∗
· w0 + kδ (proved by induction)
Denominator2
= ||wt+1||2
= (wt + pi) · (wt + pi)
= ||wt||2
+ 2wt · pi + ||pi||2
)
≤ ||wt||2
+ ||pi||2
(∵ wt · pi ≤ 0)
≤ ||wt||2
+ 1 (∵ ||pi||2
= 1)
≤ (||wt−1||2
+ 1) + 1
≤ ||wt−1||2
+ 2
≤ ||w0||2
+ (k) (By same observation that we made about δ)
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
48/1
Proof (continued:)
So far we have, wT
· pi ≤ 0 (and hence we made the correction)
cosβ =
w∗ · wt+1
||wt+1||
(by definition)
Numerator ≥ w∗
· w0 + kδ (proved by induction)
Denominator2
≤ ||w0||2
+ k (By same observation that we made about δ)
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
48/1
Proof (continued:)
So far we have, wT
· pi ≤ 0 (and hence we made the correction)
cosβ =
w∗ · wt+1
||wt+1||
(by definition)
Numerator ≥ w∗
· w0 + kδ (proved by induction)
Denominator2
≤ ||w0||2
+ k (By same observation that we made about δ)
cosβ ≥
w∗ · w0 + kδ
p
||w0||2 + k
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
48/1
Proof (continued:)
So far we have, wT
· pi ≤ 0 (and hence we made the correction)
cosβ =
w∗ · wt+1
||wt+1||
(by definition)
Numerator ≥ w∗
· w0 + kδ (proved by induction)
Denominator2
≤ ||w0||2
+ k (By same observation that we made about δ)
cosβ ≥
w∗ · w0 + kδ
p
||w0||2 + k
cosβ thus grows proportional to
√
k
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
48/1
Proof (continued:)
So far we have, wT
· pi ≤ 0 (and hence we made the correction)
cosβ =
w∗ · wt+1
||wt+1||
(by definition)
Numerator ≥ w∗
· w0 + kδ (proved by induction)
Denominator2
≤ ||w0||2
+ k (By same observation that we made about δ)
cosβ ≥
w∗ · w0 + kδ
p
||w0||2 + k
cosβ thus grows proportional to
√
k
As k (number of corrections) increases cosβ can become arbitrarily large
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
48/1
Proof (continued:)
So far we have, wT
· pi ≤ 0 (and hence we made the correction)
cosβ =
w∗ · wt+1
||wt+1||
(by definition)
Numerator ≥ w∗
· w0 + kδ (proved by induction)
Denominator2
≤ ||w0||2
+ k (By same observation that we made about δ)
cosβ ≥
w∗ · w0 + kδ
p
||w0||2 + k
cosβ thus grows proportional to
√
k
As k (number of corrections) increases cosβ can become arbitrarily large
But since cosβ ≤ 1, k must be bounded by a maximum number
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
48/1
Proof (continued:)
So far we have, wT
· pi ≤ 0 (and hence we made the correction)
cosβ =
w∗ · wt+1
||wt+1||
(by definition)
Numerator ≥ w∗
· w0 + kδ (proved by induction)
Denominator2
≤ ||w0||2
+ k (By same observation that we made about δ)
cosβ ≥
w∗ · w0 + kδ
p
||w0||2 + k
cosβ thus grows proportional to
√
k
As k (number of corrections) increases cosβ can become arbitrarily large
But since cosβ ≤ 1, k must be bounded by a maximum number
Thus, there can only be a finite number of corrections (k) to w and the algorithm
will converge!
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
49/1
Coming back to our questions ...
What about non-boolean (say, real) inputs?
Do we always need to hand code the threshold?
Are all inputs equal? What if we want to assign more weight (importance) to
some inputs?
What about functions which are not linearly separable ?
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
49/1
Coming back to our questions ...
What about non-boolean (say, real) inputs? Real valued inputs are allowed in
perceptron
Do we always need to hand code the threshold?
Are all inputs equal? What if we want to assign more weight (importance) to
some inputs?
What about functions which are not linearly separable ?
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
49/1
Coming back to our questions ...
What about non-boolean (say, real) inputs? Real valued inputs are allowed in
perceptron
Do we always need to hand code the threshold? No, we can learn the threshold
Are all inputs equal? What if we want to assign more weight (importance) to
some inputs?
What about functions which are not linearly separable ?
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
49/1
Coming back to our questions ...
What about non-boolean (say, real) inputs? Real valued inputs are allowed in
perceptron
Do we always need to hand code the threshold? No, we can learn the threshold
Are all inputs equal? What if we want to assign more weight (importance) to
some inputs? A perceptron allows weights to be assigned to inputs
What about functions which are not linearly separable ?
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
49/1
Coming back to our questions ...
What about non-boolean (say, real) inputs? Real valued inputs are allowed in
perceptron
Do we always need to hand code the threshold? No, we can learn the threshold
Are all inputs equal? What if we want to assign more weight (importance) to
some inputs? A perceptron allows weights to be assigned to inputs
What about functions which are not linearly separable ? Not possible with a
single perceptron but we will see how to handle this ..
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
50/1
Module 2.7: Linearly Separable Boolean Functions
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
51/1
So what do we do about functions which are not linearly separable ?
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
51/1
So what do we do about functions which are not linearly separable ?
Let us see one such simple boolean function first ?
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
52/1
x1 x2 XOR
0 0 0 w0 +
P2
i=1 wixi < 0
1 0 1 w0 +
P2
i=1 wixi ≥ 0
0 1 1 w0 +
P2
i=1 wixi ≥ 0
1 1 0 w0 +
P2
i=1 wixi < 0
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
52/1
x1 x2 XOR
0 0 0 w0 +
P2
i=1 wixi < 0
1 0 1 w0 +
P2
i=1 wixi ≥ 0
0 1 1 w0 +
P2
i=1 wixi ≥ 0
1 1 0 w0 +
P2
i=1 wixi < 0
w0 + w1 · 0 + w2 · 0 < 0 =⇒ w0 < 0
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
52/1
x1 x2 XOR
0 0 0 w0 +
P2
i=1 wixi < 0
1 0 1 w0 +
P2
i=1 wixi ≥ 0
0 1 1 w0 +
P2
i=1 wixi ≥ 0
1 1 0 w0 +
P2
i=1 wixi < 0
w0 + w1 · 0 + w2 · 0 < 0 =⇒ w0 < 0
w0 + w1 · 0 + w2 · 1 ≥ 0 =⇒ w2 ≥ −w0
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
52/1
x1 x2 XOR
0 0 0 w0 +
P2
i=1 wixi < 0
1 0 1 w0 +
P2
i=1 wixi ≥ 0
0 1 1 w0 +
P2
i=1 wixi ≥ 0
1 1 0 w0 +
P2
i=1 wixi < 0
w0 + w1 · 0 + w2 · 0 < 0 =⇒ w0 < 0
w0 + w1 · 0 + w2 · 1 ≥ 0 =⇒ w2 ≥ −w0
w0 + w1 · 1 + w2 · 0 ≥ 0 =⇒ w1 ≥ −w0
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
52/1
x1 x2 XOR
0 0 0 w0 +
P2
i=1 wixi < 0
1 0 1 w0 +
P2
i=1 wixi ≥ 0
0 1 1 w0 +
P2
i=1 wixi ≥ 0
1 1 0 w0 +
P2
i=1 wixi < 0
w0 + w1 · 0 + w2 · 0 < 0 =⇒ w0 < 0
w0 + w1 · 0 + w2 · 1 ≥ 0 =⇒ w2 ≥ −w0
w0 + w1 · 1 + w2 · 0 ≥ 0 =⇒ w1 ≥ −w0
w0 + w1 · 1 + w2 · 1 < 0 =⇒ w1 + w2 < −w0
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
52/1
x1 x2 XOR
0 0 0 w0 +
P2
i=1 wixi < 0
1 0 1 w0 +
P2
i=1 wixi ≥ 0
0 1 1 w0 +
P2
i=1 wixi ≥ 0
1 1 0 w0 +
P2
i=1 wixi < 0
w0 + w1 · 0 + w2 · 0 < 0 =⇒ w0 < 0
w0 + w1 · 0 + w2 · 1 ≥ 0 =⇒ w2 ≥ −w0
w0 + w1 · 1 + w2 · 0 ≥ 0 =⇒ w1 ≥ −w0
w0 + w1 · 1 + w2 · 1 < 0 =⇒ w1 + w2 < −w0
The fourth condition contradicts conditions 2
and 3
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
52/1
x1 x2 XOR
0 0 0 w0 +
P2
i=1 wixi < 0
1 0 1 w0 +
P2
i=1 wixi ≥ 0
0 1 1 w0 +
P2
i=1 wixi ≥ 0
1 1 0 w0 +
P2
i=1 wixi < 0
w0 + w1 · 0 + w2 · 0 < 0 =⇒ w0 < 0
w0 + w1 · 0 + w2 · 1 ≥ 0 =⇒ w2 ≥ −w0
w0 + w1 · 1 + w2 · 0 ≥ 0 =⇒ w1 ≥ −w0
w0 + w1 · 1 + w2 · 1 < 0 =⇒ w1 + w2 < −w0
The fourth condition contradicts conditions 2
and 3
Hence we cannot have a solution to this set of
inequalities
x1
x2
(0, 0)
(0, 1)
(1, 0)
(1, 1)
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
52/1
x1 x2 XOR
0 0 0 w0 +
P2
i=1 wixi < 0
1 0 1 w0 +
P2
i=1 wixi ≥ 0
0 1 1 w0 +
P2
i=1 wixi ≥ 0
1 1 0 w0 +
P2
i=1 wixi < 0
w0 + w1 · 0 + w2 · 0 < 0 =⇒ w0 < 0
w0 + w1 · 0 + w2 · 1 ≥ 0 =⇒ w2 ≥ −w0
w0 + w1 · 1 + w2 · 0 ≥ 0 =⇒ w1 ≥ −w0
w0 + w1 · 1 + w2 · 1 < 0 =⇒ w1 + w2 < −w0
The fourth condition contradicts conditions 2
and 3
Hence we cannot have a solution to this set of
inequalities
x1
x2
(0, 0)
(0, 1)
(1, 0)
(1, 1)
And indeed you can see that
it is impossible to draw a
line which separates the red
points from the blue points
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
53/1
Most real world data is not linearly separable
and will always contain some outliers
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
53/1
Most real world data is not linearly separable
and will always contain some outliers
In fact, sometimes there may not be any out-
liers but still the data may not be linearly sep-
arable
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
53/1
+
+
+
+
+
+
+
+
+
+
+
++++
+
+
+
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
oooo o o o oooo
o
o
o
o
Most real world data is not linearly separable
and will always contain some outliers
In fact, sometimes there may not be any out-
liers but still the data may not be linearly sep-
arable
We need computational units (models) which
can deal with such data
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
53/1
+
+
+
+
+
+
+
+
+
+
+
++++
+
+
+
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
oooo o o o oooo
o
o
o
o
Most real world data is not linearly separable
and will always contain some outliers
In fact, sometimes there may not be any out-
liers but still the data may not be linearly sep-
arable
We need computational units (models) which
can deal with such data
While a single perceptron cannot deal with
such data, we will show that a network of per-
ceptrons can indeed deal with such data
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
54/1
Before seeing how a network of perceptrons can deal with linearly inseparable
data, we will discuss boolean functions in some more detail ...
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
55/1
How many boolean functions can you design from 2 inputs ?
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
55/1
How many boolean functions can you design from 2 inputs ?
Let us begin with some easy ones which you already know ..
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
55/1
How many boolean functions can you design from 2 inputs ?
Let us begin with some easy ones which you already know ..
x1 x2
0 0
0 1
1 0
1 1
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
55/1
How many boolean functions can you design from 2 inputs ?
Let us begin with some easy ones which you already know ..
x1 x2 f1
0 0 0
0 1 0
1 0 0
1 1 0
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
55/1
How many boolean functions can you design from 2 inputs ?
Let us begin with some easy ones which you already know ..
x1 x2 f1 f16
0 0 0 1
0 1 0 1
1 0 0 1
1 1 0 1
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
55/1
How many boolean functions can you design from 2 inputs ?
Let us begin with some easy ones which you already know ..
x1 x2 f1 f2 f16
0 0 0 0 1
0 1 0 0 1
1 0 0 0 1
1 1 0 1 1
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
55/1
How many boolean functions can you design from 2 inputs ?
Let us begin with some easy ones which you already know ..
x1 x2 f1 f2 f8 f16
0 0 0 0 0 1
0 1 0 0 1 1
1 0 0 0 1 1
1 1 0 1 1 1
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
55/1
How many boolean functions can you design from 2 inputs ?
Let us begin with some easy ones which you already know ..
x1 x2 f1 f2 f3 f8 f16
0 0 0 0 0 0 1
0 1 0 0 0 1 1
1 0 0 0 1 1 1
1 1 0 1 0 1 1
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
55/1
How many boolean functions can you design from 2 inputs ?
Let us begin with some easy ones which you already know ..
x1 x2 f1 f2 f3 f4 f8 f16
0 0 0 0 0 0 0 1
0 1 0 0 0 0 1 1
1 0 0 0 1 1 1 1
1 1 0 1 0 1 1 1
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
55/1
How many boolean functions can you design from 2 inputs ?
Let us begin with some easy ones which you already know ..
x1 x2 f1 f2 f3 f4 f5 f8 f16
0 0 0 0 0 0 0 0 1
0 1 0 0 0 0 1 1 1
1 0 0 0 1 1 0 1 1
1 1 0 1 0 1 0 1 1
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
55/1
How many boolean functions can you design from 2 inputs ?
Let us begin with some easy ones which you already know ..
x1 x2 f1 f2 f3 f4 f5 f6 f8 f16
0 0 0 0 0 0 0 0 0 1
0 1 0 0 0 0 1 1 1 1
1 0 0 0 1 1 0 0 1 1
1 1 0 1 0 1 0 1 1 1
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
55/1
How many boolean functions can you design from 2 inputs ?
Let us begin with some easy ones which you already know ..
x1 x2 f1 f2 f3 f4 f5 f6 f7 f8 f16
0 0 0 0 0 0 0 0 0 0 1
0 1 0 0 0 0 1 1 1 1 1
1 0 0 0 1 1 0 0 1 1 1
1 1 0 1 0 1 0 1 0 1 1
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
55/1
How many boolean functions can you design from 2 inputs ?
Let us begin with some easy ones which you already know ..
x1 x2 f1 f2 f3 f4 f5 f6 f7 f8 f9 f16
0 0 0 0 0 0 0 0 0 0 1 1
0 1 0 0 0 0 1 1 1 1 0 1
1 0 0 0 1 1 0 0 1 1 0 1
1 1 0 1 0 1 0 1 0 1 0 1
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
55/1
How many boolean functions can you design from 2 inputs ?
Let us begin with some easy ones which you already know ..
x1 x2 f1 f2 f3 f4 f5 f6 f7 f8 f9 f10 f16
0 0 0 0 0 0 0 0 0 0 1 1 1
0 1 0 0 0 0 1 1 1 1 0 0 1
1 0 0 0 1 1 0 0 1 1 0 0 1
1 1 0 1 0 1 0 1 0 1 0 1 1
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
55/1
How many boolean functions can you design from 2 inputs ?
Let us begin with some easy ones which you already know ..
x1 x2 f1 f2 f3 f4 f5 f6 f7 f8 f9 f10 f11 f16
0 0 0 0 0 0 0 0 0 0 1 1 1 1
0 1 0 0 0 0 1 1 1 1 0 0 0 1
1 0 0 0 1 1 0 0 1 1 0 0 1 1
1 1 0 1 0 1 0 1 0 1 0 1 0 1
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
55/1
How many boolean functions can you design from 2 inputs ?
Let us begin with some easy ones which you already know ..
x1 x2 f1 f2 f3 f4 f5 f6 f7 f8 f9 f10 f11 f12 f16
0 0 0 0 0 0 0 0 0 0 1 1 1 1 1
0 1 0 0 0 0 1 1 1 1 0 0 0 0 1
1 0 0 0 1 1 0 0 1 1 0 0 1 1 1
1 1 0 1 0 1 0 1 0 1 0 1 0 1 1
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
55/1
How many boolean functions can you design from 2 inputs ?
Let us begin with some easy ones which you already know ..
x1 x2 f1 f2 f3 f4 f5 f6 f7 f8 f9 f10 f11 f12 f13 f16
0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1
0 1 0 0 0 0 1 1 1 1 0 0 0 0 1 1
1 0 0 0 1 1 0 0 1 1 0 0 1 1 0 1
1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
55/1
How many boolean functions can you design from 2 inputs ?
Let us begin with some easy ones which you already know ..
x1 x2 f1 f2 f3 f4 f5 f6 f7 f8 f9 f10 f11 f12 f13 f14 f16
0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1
0 1 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1
1 0 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1
1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 1
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
55/1
How many boolean functions can you design from 2 inputs ?
Let us begin with some easy ones which you already know ..
x1 x2 f1 f2 f3 f4 f5 f6 f7 f8 f9 f10 f11 f12 f13 f14 f15 f16
0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1
0 1 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1
1 0 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1
1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
55/1
How many boolean functions can you design from 2 inputs ?
Let us begin with some easy ones which you already know ..
x1 x2 f1 f2 f3 f4 f5 f6 f7 f8 f9 f10 f11 f12 f13 f14 f15 f16
0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1
0 1 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1
1 0 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1
1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
Of these, how many are linearly separable ?
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
55/1
How many boolean functions can you design from 2 inputs ?
Let us begin with some easy ones which you already know ..
x1 x2 f1 f2 f3 f4 f5 f6 f7 f8 f9 f10 f11 f12 f13 f14 f15 f16
0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1
0 1 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1
1 0 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1
1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
Of these, how many are linearly separable ? (turns out all except XOR and
!XOR - feel free to verify)
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
55/1
How many boolean functions can you design from 2 inputs ?
Let us begin with some easy ones which you already know ..
x1 x2 f1 f2 f3 f4 f5 f6 f7 f8 f9 f10 f11 f12 f13 f14 f15 f16
0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1
0 1 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1
1 0 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1
1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
Of these, how many are linearly separable ? (turns out all except XOR and
!XOR - feel free to verify)
In general, how many boolean functions can you have for n inputs ?
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
55/1
How many boolean functions can you design from 2 inputs ?
Let us begin with some easy ones which you already know ..
x1 x2 f1 f2 f3 f4 f5 f6 f7 f8 f9 f10 f11 f12 f13 f14 f15 f16
0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1
0 1 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1
1 0 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1
1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
Of these, how many are linearly separable ? (turns out all except XOR and
!XOR - feel free to verify)
In general, how many boolean functions can you have for n inputs ? 22n
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
55/1
How many boolean functions can you design from 2 inputs ?
Let us begin with some easy ones which you already know ..
x1 x2 f1 f2 f3 f4 f5 f6 f7 f8 f9 f10 f11 f12 f13 f14 f15 f16
0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1
0 1 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1
1 0 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1
1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
Of these, how many are linearly separable ? (turns out all except XOR and
!XOR - feel free to verify)
In general, how many boolean functions can you have for n inputs ? 22n
How many of these 22n
functions are not linearly separable ?
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
55/1
How many boolean functions can you design from 2 inputs ?
Let us begin with some easy ones which you already know ..
x1 x2 f1 f2 f3 f4 f5 f6 f7 f8 f9 f10 f11 f12 f13 f14 f15 f16
0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1
0 1 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1
1 0 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1
1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
Of these, how many are linearly separable ? (turns out all except XOR and
!XOR - feel free to verify)
In general, how many boolean functions can you have for n inputs ? 22n
How many of these 22n
functions are not linearly separable ? For the time being,
it suffices to know that at least some of these may not be linearly inseparable
(I encourage you to figure out the exact answer :-) )
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
56/1
Module 2.8: Representation Power of a Network of
Perceptrons
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
57/1
We will now see how to implement any boolean function using a network of
perceptrons ...
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
58/1
For this discussion, we will assume True
= +1 and False = -1
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
58/1
x1 x2
For this discussion, we will assume True
= +1 and False = -1
We consider 2 inputs and 4 perceptrons
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
58/1
x1 x2
For this discussion, we will assume True
= +1 and False = -1
We consider 2 inputs and 4 perceptrons
Each input is connected to all the 4 per-
ceptrons with specific weights
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
58/1
x1 x2
red edge indicates w = -1
blue edge indicates w = +1
For this discussion, we will assume True
= +1 and False = -1
We consider 2 inputs and 4 perceptrons
Each input is connected to all the 4 per-
ceptrons with specific weights
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
58/1
x1 x2
red edge indicates w = -1
blue edge indicates w = +1
For this discussion, we will assume True
= +1 and False = -1
We consider 2 inputs and 4 perceptrons
Each input is connected to all the 4 per-
ceptrons with specific weights
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
58/1
x1 x2
red edge indicates w = -1
blue edge indicates w = +1
For this discussion, we will assume True
= +1 and False = -1
We consider 2 inputs and 4 perceptrons
Each input is connected to all the 4 per-
ceptrons with specific weights
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
58/1
x1 x2
red edge indicates w = -1
blue edge indicates w = +1
For this discussion, we will assume True
= +1 and False = -1
We consider 2 inputs and 4 perceptrons
Each input is connected to all the 4 per-
ceptrons with specific weights
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
58/1
x1 x2
red edge indicates w = -1
blue edge indicates w = +1
For this discussion, we will assume True
= +1 and False = -1
We consider 2 inputs and 4 perceptrons
Each input is connected to all the 4 per-
ceptrons with specific weights
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
58/1
x1 x2
bias =-2
red edge indicates w = -1
blue edge indicates w = +1
For this discussion, we will assume True
= +1 and False = -1
We consider 2 inputs and 4 perceptrons
Each input is connected to all the 4 per-
ceptrons with specific weights
The bias (w0) of each perceptron is -2
(i.e., each perceptron will fire only if the
weighted sum of its input is ≥ 2)
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
58/1
x1 x2
bias =-2
red edge indicates w = -1
blue edge indicates w = +1
For this discussion, we will assume True
= +1 and False = -1
We consider 2 inputs and 4 perceptrons
Each input is connected to all the 4 per-
ceptrons with specific weights
The bias (w0) of each perceptron is -2
(i.e., each perceptron will fire only if the
weighted sum of its input is ≥ 2)
Each of these perceptrons is connected to
an output perceptron by weights (which
need to be learned)
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
58/1
x1 x2
bias =-2
red edge indicates w = -1
blue edge indicates w = +1
For this discussion, we will assume True
= +1 and False = -1
We consider 2 inputs and 4 perceptrons
Each input is connected to all the 4 per-
ceptrons with specific weights
The bias (w0) of each perceptron is -2
(i.e., each perceptron will fire only if the
weighted sum of its input is ≥ 2)
Each of these perceptrons is connected to
an output perceptron by weights (which
need to be learned)
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
58/1
x1 x2
bias =-2
w1 w2 w3 w4
red edge indicates w = -1
blue edge indicates w = +1
For this discussion, we will assume True
= +1 and False = -1
We consider 2 inputs and 4 perceptrons
Each input is connected to all the 4 per-
ceptrons with specific weights
The bias (w0) of each perceptron is -2
(i.e., each perceptron will fire only if the
weighted sum of its input is ≥ 2)
Each of these perceptrons is connected to
an output perceptron by weights (which
need to be learned)
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
58/1
x1 x2
bias =-2
y
w1 w2 w3 w4
red edge indicates w = -1
blue edge indicates w = +1
For this discussion, we will assume True
= +1 and False = -1
We consider 2 inputs and 4 perceptrons
Each input is connected to all the 4 per-
ceptrons with specific weights
The bias (w0) of each perceptron is -2
(i.e., each perceptron will fire only if the
weighted sum of its input is ≥ 2)
Each of these perceptrons is connected to
an output perceptron by weights (which
need to be learned)
The output of this perceptron (y) is the
output of this network
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
59/1
x1 x2
bias =-2
y
w1 w2 w3 w4
red edge indicates w = -1
blue edge indicates w = +1
Terminology:
This network contains 3 layers
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
59/1
x1 x2
bias =-2
y
w1 w2 w3 w4
red edge indicates w = -1
blue edge indicates w = +1
Terminology:
This network contains 3 layers
The layer containing the inputs (x1, x2) is
called the input layer
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
59/1
x1 x2
bias =-2
y
w1 w2 w3 w4
red edge indicates w = -1
blue edge indicates w = +1
Terminology:
This network contains 3 layers
The layer containing the inputs (x1, x2) is
called the input layer
The middle layer containing the 4 per-
ceptrons is called the hidden layer
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
59/1
x1 x2
bias =-2
y
w1 w2 w3 w4
red edge indicates w = -1
blue edge indicates w = +1
Terminology:
This network contains 3 layers
The layer containing the inputs (x1, x2) is
called the input layer
The middle layer containing the 4 per-
ceptrons is called the hidden layer
The final layer containing one output
neuron is called the output layer
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
59/1
x1 x2
h1 h2 h3 h4
bias =-2
y
w1 w2 w3 w4
red edge indicates w = -1
blue edge indicates w = +1
Terminology:
This network contains 3 layers
The layer containing the inputs (x1, x2) is
called the input layer
The middle layer containing the 4 per-
ceptrons is called the hidden layer
The final layer containing one output
neuron is called the output layer
The outputs of the 4 perceptrons in the
hidden layer are denoted by h1, h2, h3, h4
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
59/1
x1 x2
h1 h2 h3 h4
bias =-2
y
w1 w2 w3 w4
red edge indicates w = -1
blue edge indicates w = +1
Terminology:
This network contains 3 layers
The layer containing the inputs (x1, x2) is
called the input layer
The middle layer containing the 4 per-
ceptrons is called the hidden layer
The final layer containing one output
neuron is called the output layer
The outputs of the 4 perceptrons in the
hidden layer are denoted by h1, h2, h3, h4
The red and blue edges are called layer 1
weights
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
59/1
x1 x2
h1 h2 h3 h4
bias =-2
y
w1 w2 w3 w4
red edge indicates w = -1
blue edge indicates w = +1
Terminology:
This network contains 3 layers
The layer containing the inputs (x1, x2) is
called the input layer
The middle layer containing the 4 per-
ceptrons is called the hidden layer
The final layer containing one output
neuron is called the output layer
The outputs of the 4 perceptrons in the
hidden layer are denoted by h1, h2, h3, h4
The red and blue edges are called layer 1
weights
w1, w2, w3, w4 are called layer 2 weights
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
60/1
x1 x2
h1 h2 h3 h4
bias =-2
y
w1 w2 w3 w4
red edge indicates w = -1
blue edge indicates w = +1
We claim that this network can be used to
implement any boolean function (linearly
separable or not) !
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
60/1
x1 x2
h1 h2 h3 h4
bias =-2
y
w1 w2 w3 w4
red edge indicates w = -1
blue edge indicates w = +1
We claim that this network can be used to
implement any boolean function (linearly
separable or not) !
In other words, we can find w1, w2, w3, w4
such that the truth table of any boolean
function can be represented by this net-
work
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
60/1
x1 x2
h1 h2 h3 h4
bias =-2
y
w1 w2 w3 w4
red edge indicates w = -1
blue edge indicates w = +1
We claim that this network can be used to
implement any boolean function (linearly
separable or not) !
In other words, we can find w1, w2, w3, w4
such that the truth table of any boolean
function can be represented by this net-
work
Astonishing claim!
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
60/1
x1 x2
h1 h2 h3 h4
bias =-2
y
w1 w2 w3 w4
red edge indicates w = -1
blue edge indicates w = +1
We claim that this network can be used to
implement any boolean function (linearly
separable or not) !
In other words, we can find w1, w2, w3, w4
such that the truth table of any boolean
function can be represented by this net-
work
Astonishing claim! Well, not really, if you
understand what is going on
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
60/1
x1 x2
h1 h2 h3 h4
bias =-2
y
w1 w2 w3 w4
red edge indicates w = -1
blue edge indicates w = +1
We claim that this network can be used to
implement any boolean function (linearly
separable or not) !
In other words, we can find w1, w2, w3, w4
such that the truth table of any boolean
function can be represented by this net-
work
Astonishing claim! Well, not really, if you
understand what is going on
Each perceptron in the middle layer fires
only for a specific input (and no two per-
ceptrons fire for the same input)
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
60/1
x1 x2
h1 h2 h3 h4
-1,-1
bias =-2
y
w1 w2 w3 w4
red edge indicates w = -1
blue edge indicates w = +1
We claim that this network can be used to
implement any boolean function (linearly
separable or not) !
In other words, we can find w1, w2, w3, w4
such that the truth table of any boolean
function can be represented by this net-
work
Astonishing claim! Well, not really, if you
understand what is going on
Each perceptron in the middle layer fires
only for a specific input (and no two per-
ceptrons fire for the same input)
the first perceptron fires for {-1,-1}
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
60/1
x1 x2
h1 h2 h3 h4
-1,-1 -1,1
bias =-2
y
w1 w2 w3 w4
red edge indicates w = -1
blue edge indicates w = +1
We claim that this network can be used to
implement any boolean function (linearly
separable or not) !
In other words, we can find w1, w2, w3, w4
such that the truth table of any boolean
function can be represented by this net-
work
Astonishing claim! Well, not really, if you
understand what is going on
Each perceptron in the middle layer fires
only for a specific input (and no two per-
ceptrons fire for the same input)
the second perceptron fires for {-1,1}
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
60/1
x1 x2
h1 h2 h3 h4
-1,-1 -1,1 1,-1
bias =-2
y
w1 w2 w3 w4
red edge indicates w = -1
blue edge indicates w = +1
We claim that this network can be used to
implement any boolean function (linearly
separable or not) !
In other words, we can find w1, w2, w3, w4
such that the truth table of any boolean
function can be represented by this net-
work
Astonishing claim! Well, not really, if you
understand what is going on
Each perceptron in the middle layer fires
only for a specific input (and no two per-
ceptrons fire for the same input)
the third perceptron fires for {1,-1}
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
60/1
x1 x2
h1 h2 h3 h4
-1,-1 -1,1 1,-1 1,1
bias =-2
y
w1 w2 w3 w4
red edge indicates w = -1
blue edge indicates w = +1
We claim that this network can be used to
implement any boolean function (linearly
separable or not) !
In other words, we can find w1, w2, w3, w4
such that the truth table of any boolean
function can be represented by this net-
work
Astonishing claim! Well, not really, if you
understand what is going on
Each perceptron in the middle layer fires
only for a specific input (and no two per-
ceptrons fire for the same input)
the fourth perceptron fires for {1,1}
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
60/1
x1 x2
h1 h2 h3 h4
-1,-1 -1,1 1,-1 1,1
bias =-2
y
w1 w2 w3 w4
red edge indicates w = -1
blue edge indicates w = +1
We claim that this network can be used to
implement any boolean function (linearly
separable or not) !
In other words, we can find w1, w2, w3, w4
such that the truth table of any boolean
function can be represented by this net-
work
Astonishing claim! Well, not really, if you
understand what is going on
Each perceptron in the middle layer fires
only for a specific input (and no two per-
ceptrons fire for the same input)
Let us see why this network works by tak-
ing an example of the XOR function
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
61/1
x1 x2
h1 h2 h3 h4
-1,-1 -1,1 1,-1 1,1
bias =-2
y
w1 w2 w3 w4
red edge indicates w = -1
blue edge indicates w = +1
Let w0 be the bias output of the neuron
(i.e., it will fire if
P4
i=1 wihi ≥ w0)
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
61/1
x1 x2
h1 h2 h3 h4
-1,-1 -1,1 1,-1 1,1
bias =-2
y
w1 w2 w3 w4
red edge indicates w = -1
blue edge indicates w = +1
Let w0 be the bias output of the neuron
(i.e., it will fire if
P4
i=1 wihi ≥ w0)
x1 x2 XOR h1 h2 h3 h4
P4
i=1 wihi
0 0 0 1 0 0 0 w1
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
61/1
x1 x2
h1 h2 h3 h4
-1,-1 -1,1 1,-1 1,1
bias =-2
y
w1 w2 w3 w4
red edge indicates w = -1
blue edge indicates w = +1
Let w0 be the bias output of the neuron
(i.e., it will fire if
P4
i=1 wihi ≥ w0)
x1 x2 XOR h1 h2 h3 h4
P4
i=1 wihi
0 0 0 1 0 0 0 w1
0 1 1 0 1 0 0 w2
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
61/1
x1 x2
h1 h2 h3 h4
-1,-1 -1,1 1,-1 1,1
bias =-2
y
w1 w2 w3 w4
red edge indicates w = -1
blue edge indicates w = +1
Let w0 be the bias output of the neuron
(i.e., it will fire if
P4
i=1 wihi ≥ w0)
x1 x2 XOR h1 h2 h3 h4
P4
i=1 wihi
0 0 0 1 0 0 0 w1
0 1 1 0 1 0 0 w2
1 0 1 0 0 1 0 w3
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
61/1
x1 x2
h1 h2 h3 h4
-1,-1 -1,1 1,-1 1,1
bias =-2
y
w1 w2 w3 w4
red edge indicates w = -1
blue edge indicates w = +1
Let w0 be the bias output of the neuron
(i.e., it will fire if
P4
i=1 wihi ≥ w0)
x1 x2 XOR h1 h2 h3 h4
P4
i=1 wihi
0 0 0 1 0 0 0 w1
0 1 1 0 1 0 0 w2
1 0 1 0 0 1 0 w3
1 1 0 0 0 0 1 w4
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
61/1
x1 x2
h1 h2 h3 h4
-1,-1 -1,1 1,-1 1,1
bias =-2
y
w1 w2 w3 w4
red edge indicates w = -1
blue edge indicates w = +1
Let w0 be the bias output of the neuron
(i.e., it will fire if
P4
i=1 wihi ≥ w0)
x1 x2 XOR h1 h2 h3 h4
P4
i=1 wihi
0 0 0 1 0 0 0 w1
0 1 1 0 1 0 0 w2
1 0 1 0 0 1 0 w3
1 1 0 0 0 0 1 w4
This results in the following four conditions
to implement XOR: w1 < w0, w2 ≥ w0, w3 ≥
w0, w4 < w0
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
61/1
x1 x2
h1 h2 h3 h4
-1,-1 -1,1 1,-1 1,1
bias =-2
y
w1 w2 w3 w4
red edge indicates w = -1
blue edge indicates w = +1
Let w0 be the bias output of the neuron
(i.e., it will fire if
P4
i=1 wihi ≥ w0)
x1 x2 XOR h1 h2 h3 h4
P4
i=1 wihi
0 0 0 1 0 0 0 w1
0 1 1 0 1 0 0 w2
1 0 1 0 0 1 0 w3
1 1 0 0 0 0 1 w4
This results in the following four conditions
to implement XOR: w1 < w0, w2 ≥ w0, w3 ≥
w0, w4 < w0
Unlike before, there are no contradictions now
and the system of inequalities can be satisfied
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
61/1
x1 x2
h1 h2 h3 h4
-1,-1 -1,1 1,-1 1,1
bias =-2
y
w1 w2 w3 w4
red edge indicates w = -1
blue edge indicates w = +1
Let w0 be the bias output of the neuron
(i.e., it will fire if
P4
i=1 wihi ≥ w0)
x1 x2 XOR h1 h2 h3 h4
P4
i=1 wihi
0 0 0 1 0 0 0 w1
0 1 1 0 1 0 0 w2
1 0 1 0 0 1 0 w3
1 1 0 0 0 0 1 w4
This results in the following four conditions
to implement XOR: w1 < w0, w2 ≥ w0, w3 ≥
w0, w4 < w0
Unlike before, there are no contradictions now
and the system of inequalities can be satisfied
Essentially each wi is now responsible for one
of the 4 possible inputs and can be adjusted
to get the desired output for that input
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
62/1
x1 x2
h1 h2 h3 h4
-1,-1 -1,1 1,-1 1,1
bias =-2
y
w1 w2 w3 w4
red edge indicates w = -1
blue edge indicates w = +1
It should be clear that the same network
can be used to represent the remaining 15
boolean functions also
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
62/1
x1 x2
h1 h2 h3 h4
-1,-1 -1,1 1,-1 1,1
bias =-2
y
w1 w2 w3 w4
red edge indicates w = -1
blue edge indicates w = +1
It should be clear that the same network
can be used to represent the remaining 15
boolean functions also
Each boolean function will result in a dif-
ferent set of non-contradicting inequalit-
ies which can be satisfied by appropriately
setting w1, w2, w3, w4
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
62/1
x1 x2
h1 h2 h3 h4
-1,-1 -1,1 1,-1 1,1
bias =-2
y
w1 w2 w3 w4
red edge indicates w = -1
blue edge indicates w = +1
It should be clear that the same network
can be used to represent the remaining 15
boolean functions also
Each boolean function will result in a dif-
ferent set of non-contradicting inequalit-
ies which can be satisfied by appropriately
setting w1, w2, w3, w4
Try it!
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
63/1
What if we have more than 3 inputs ?
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
64/1
Again each of the 8 perceptorns will fire only for one of the 8 inputs
Each of the 8 weights in the second layer is responsible for one of the 8 inputs
and can be adjusted to produce the desired output for that input
x1 x2 x3
bias =-3
y
w1 w2 w3 w4 w5 w6 w7 w8
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
65/1
What if we have n inputs ?
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
66/1
Theorem
Any boolean function of n inputs can be represented exactly by a network of
perceptrons containing 1 hidden layer with 2n perceptrons and one output layer
containing 1 perceptron
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
66/1
Theorem
Any boolean function of n inputs can be represented exactly by a network of
perceptrons containing 1 hidden layer with 2n perceptrons and one output layer
containing 1 perceptron
Proof (informal:) We just saw how to construct such a network
Note: A network of 2n + 1 perceptrons is not necessary but sufficient. For
example, we already saw how to represent AND function with just 1 perceptron
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
66/1
Theorem
Any boolean function of n inputs can be represented exactly by a network of
perceptrons containing 1 hidden layer with 2n perceptrons and one output layer
containing 1 perceptron
Proof (informal:) We just saw how to construct such a network
Note: A network of 2n + 1 perceptrons is not necessary but sufficient. For
example, we already saw how to represent AND function with just 1 perceptron
Catch: As n increases the number of perceptrons in the hidden layers obviously
increases exponentially
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
67/1
Again, why do we care about boolean functions ?
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
67/1
Again, why do we care about boolean functions ?
How does this help us with our original problem: which was to predict whether
we like a movie or not?
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
67/1
Again, why do we care about boolean functions ?
How does this help us with our original problem: which was to predict whether
we like a movie or not? Let us see!
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
68/1
p1
p2
.
.
.
n1
n2
.
.
.










x11 x12 . . . x1n y1 = 1
x21 x22 . . . x2n y2 = 1
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
xk1 xk2 . . . xkn yi = 0
xj1 xj2 . . . xjn yj = 0
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.










We are given this data about our past movie
experience
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
68/1
p1
p2
.
.
.
n1
n2
.
.
.










x11 x12 . . . x1n y1 = 1
x21 x22 . . . x2n y2 = 1
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
xk1 xk2 . . . xkn yi = 0
xj1 xj2 . . . xjn yj = 0
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.










We are given this data about our past movie
experience
For each movie, we are given the values of the
various factors (x1, x2, . . . , xn) that we base
our decision on and we are also also given the
value of y (like/dislike)
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
68/1
p1
p2
.
.
.
n1
n2
.
.
.










x11 x12 . . . x1n y1 = 1
x21 x22 . . . x2n y2 = 1
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
xk1 xk2 . . . xkn yi = 0
xj1 xj2 . . . xjn yj = 0
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.










We are given this data about our past movie
experience
For each movie, we are given the values of the
various factors (x1, x2, . . . , xn) that we base
our decision on and we are also also given the
value of y (like/dislike)
pi’s are the points for which the output was 1
and ni’s are the points for which it was 0
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
68/1
p1
p2
.
.
.
n1
n2
.
.
.










x11 x12 . . . x1n y1 = 1
x21 x22 . . . x2n y2 = 1
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
xk1 xk2 . . . xkn yi = 0
xj1 xj2 . . . xjn yj = 0
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.










We are given this data about our past movie
experience
For each movie, we are given the values of the
various factors (x1, x2, . . . , xn) that we base
our decision on and we are also also given the
value of y (like/dislike)
pi’s are the points for which the output was 1
and ni’s are the points for which it was 0
The data may or may not be linearly separable
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
68/1
x1 x2 x3
bias =-3
y
w1 w2 w3 w4 w5 w6 w7 w8
p1
p2
.
.
.
n1
n2
.
.
.










x11 x12 . . . x1n y1 = 1
x21 x22 . . . x2n y2 = 1
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
xk1 xk2 . . . xkn yi = 0
xj1 xj2 . . . xjn yj = 0
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.










We are given this data about our past movie
experience
For each movie, we are given the values of the
various factors (x1, x2, . . . , xn) that we base
our decision on and we are also also given the
value of y (like/dislike)
pi’s are the points for which the output was 1
and ni’s are the points for which it was 0
The data may or may not be linearly separable
The proof that we just saw tells us that it is
possible to have a network of perceptrons and
learn the weights in this network such that for
any given pi or nj the output of the network
will be the same as yi or yj (i.e., we can sep-
arate the positive and the negative points)
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
69/1
The story so far ...
Networks of the form that we just saw (containing, an input, output and one
or more hidden layers) are called Multilayer Perceptrons (MLP, in short)
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
69/1
The story so far ...
Networks of the form that we just saw (containing, an input, output and one
or more hidden layers) are called Multilayer Perceptrons (MLP, in short)
More appropriate terminology would be“Multilayered Network of Perceptrons”
but MLP is the more commonly used name
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
69/1
The story so far ...
Networks of the form that we just saw (containing, an input, output and one
or more hidden layers) are called Multilayer Perceptrons (MLP, in short)
More appropriate terminology would be“Multilayered Network of Perceptrons”
but MLP is the more commonly used name
The theorem that we just saw gives us the representation power of a MLP with
a single hidden layer
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
69/1
The story so far ...
Networks of the form that we just saw (containing, an input, output and one
or more hidden layers) are called Multilayer Perceptrons (MLP, in short)
More appropriate terminology would be“Multilayered Network of Perceptrons”
but MLP is the more commonly used name
The theorem that we just saw gives us the representation power of a MLP with
a single hidden layer
Specifically, it tells us that a MLP with a single hidden layer can represent any
boolean function
Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

NPTEL Deep Learning Lecture notes for session 2

  • 1.
    1/1 CS7015 (Deep Learning): Lecture 2 McCulloch Pitts Neuron, Thresholding Logic, Perceptrons, Perceptron Learning Algorithm and Convergence, Multilayer Perceptrons (MLPs), Representation Power of MLPs Mitesh M. Khapra Department of Computer Science and Engineering Indian Institute of Technology Madras Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 2.
    2/1 Module 2.1: BiologicalNeurons Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 3.
    3/1 x1 x2 x3 σ y w1w2 w3 Artificial Neuron The most fundamental unit of a deep neural network is called an artificial neuron Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 4.
    3/1 x1 x2 x3 σ y w1w2 w3 Artificial Neuron The most fundamental unit of a deep neural network is called an artificial neuron Why is it called a neuron ? Where does the inspiration come from ? Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 5.
    3/1 x1 x2 x3 σ y w1w2 w3 Artificial Neuron The most fundamental unit of a deep neural network is called an artificial neuron Why is it called a neuron ? Where does the inspiration come from ? The inspiration comes from biology (more specifically, from the brain) Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 6.
    3/1 x1 x2 x3 σ y w1w2 w3 Artificial Neuron The most fundamental unit of a deep neural network is called an artificial neuron Why is it called a neuron ? Where does the inspiration come from ? The inspiration comes from biology (more specifically, from the brain) biological neurons = neural cells = neural processing units Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 7.
    3/1 x1 x2 x3 σ y w1w2 w3 Artificial Neuron The most fundamental unit of a deep neural network is called an artificial neuron Why is it called a neuron ? Where does the inspiration come from ? The inspiration comes from biology (more specifically, from the brain) biological neurons = neural cells = neural processing units We will first see what a biological neuron looks like ... Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 8.
    4/1 Biological Neurons∗ ∗ Image adaptedfrom https://cdn.vectorstock.com/i/composite/12,25/neuron-cell-vector-81225.jpg Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 9.
    4/1 Biological Neurons∗ ∗ Image adaptedfrom https://cdn.vectorstock.com/i/composite/12,25/neuron-cell-vector-81225.jpg dendrite: receives signals from other neurons Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 10.
    4/1 Biological Neurons∗ ∗ Image adaptedfrom https://cdn.vectorstock.com/i/composite/12,25/neuron-cell-vector-81225.jpg dendrite: receives signals from other neurons synapse: point of connection to other neurons Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 11.
    4/1 Biological Neurons∗ ∗ Image adaptedfrom https://cdn.vectorstock.com/i/composite/12,25/neuron-cell-vector-81225.jpg dendrite: receives signals from other neurons synapse: point of connection to other neurons soma: processes the information Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 12.
    4/1 Biological Neurons∗ ∗ Image adaptedfrom https://cdn.vectorstock.com/i/composite/12,25/neuron-cell-vector-81225.jpg dendrite: receives signals from other neurons synapse: point of connection to other neurons soma: processes the information axon: transmits the output of this neuron Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 13.
    5/1 Let us seea very cartoonish illustra- tion of how a neuron works Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 14.
    5/1 Let us seea very cartoonish illustra- tion of how a neuron works Our sense organs interact with the outside world Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 15.
    5/1 Let us seea very cartoonish illustra- tion of how a neuron works Our sense organs interact with the outside world They relay information to the neur- ons Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 16.
    5/1 Let us seea very cartoonish illustra- tion of how a neuron works Our sense organs interact with the outside world They relay information to the neur- ons The neurons (may) get activated and produces a response (laughter in this case) Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 17.
    6/1 Of course, inreality, it is not just a single neuron which does all this Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 18.
    6/1 Of course, inreality, it is not just a single neuron which does all this There is a massively parallel interconnected network of neurons Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 19.
    6/1 Of course, inreality, it is not just a single neuron which does all this There is a massively parallel interconnected network of neurons The sense organs relay information to the low- est layer of neurons Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 20.
    6/1 Of course, inreality, it is not just a single neuron which does all this There is a massively parallel interconnected network of neurons The sense organs relay information to the low- est layer of neurons Some of these neurons may fire (in red) in re- sponse to this information and in turn relay information to other neurons they are connec- ted to Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 21.
    6/1 Of course, inreality, it is not just a single neuron which does all this There is a massively parallel interconnected network of neurons The sense organs relay information to the low- est layer of neurons Some of these neurons may fire (in red) in re- sponse to this information and in turn relay information to other neurons they are connec- ted to These neurons may also fire (again, in red) and the process continues Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 22.
    6/1 Of course, inreality, it is not just a single neuron which does all this There is a massively parallel interconnected network of neurons The sense organs relay information to the low- est layer of neurons Some of these neurons may fire (in red) in re- sponse to this information and in turn relay information to other neurons they are connec- ted to These neurons may also fire (again, in red) and the process continues eventually resulting in a response (laughter in this case) Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 23.
    6/1 Of course, inreality, it is not just a single neuron which does all this There is a massively parallel interconnected network of neurons The sense organs relay information to the low- est layer of neurons Some of these neurons may fire (in red) in re- sponse to this information and in turn relay information to other neurons they are connec- ted to These neurons may also fire (again, in red) and the process continues eventually resulting in a response (laughter in this case) An average human brain has around 1011 (100 billion) neurons! Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 24.
    7/1 This massively parallelnetwork also ensures that there is division of work Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 25.
    7/1 This massively parallelnetwork also ensures that there is division of work Each neuron may perform a certain role or respond to a certain stimulus Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 26.
    7/1 A simplified illustration Thismassively parallel network also ensures that there is division of work Each neuron may perform a certain role or respond to a certain stimulus Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 27.
    7/1 A simplified illustration Thismassively parallel network also ensures that there is division of work Each neuron may perform a certain role or respond to a certain stimulus Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 28.
    7/1 A simplified illustration Thismassively parallel network also ensures that there is division of work Each neuron may perform a certain role or respond to a certain stimulus Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 29.
    7/1 A simplified illustration Thismassively parallel network also ensures that there is division of work Each neuron may perform a certain role or respond to a certain stimulus Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 30.
    7/1 A simplified illustration Thismassively parallel network also ensures that there is division of work Each neuron may perform a certain role or respond to a certain stimulus Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 31.
    8/1 The neurons inthe brain are arranged in a hierarchy Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 32.
    8/1 The neurons inthe brain are arranged in a hierarchy We illustrate this with the help of visual cortex (part of the brain) which deals with processing visual informa- tion Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 33.
    8/1 The neurons inthe brain are arranged in a hierarchy We illustrate this with the help of visual cortex (part of the brain) which deals with processing visual informa- tion Starting from the retina, the informa- tion is relayed to several layers (follow the arrows) Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 34.
    8/1 The neurons inthe brain are arranged in a hierarchy We illustrate this with the help of visual cortex (part of the brain) which deals with processing visual informa- tion Starting from the retina, the informa- tion is relayed to several layers (follow the arrows) We observe that the layers V 1, V 2 to AIT form a hierarchy (from identify- ing simple visual forms to high level objects) Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 35.
    9/1 Sample illustration ofhierarchical processing∗ ∗ Idea borrowed from Hugo Larochelle’s lecture slides Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 36.
    9/1 Sample illustration ofhierarchical processing∗ ∗ Idea borrowed from Hugo Larochelle’s lecture slides Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 37.
    9/1 Sample illustration ofhierarchical processing∗ ∗ Idea borrowed from Hugo Larochelle’s lecture slides Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 38.
    10/1 Disclaimer I understand verylittle about how the brain works! What you saw so far is an overly simplified explanation of how the brain works! But this explanation suffices for the purpose of this course! Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 39.
    11/1 Module 2.2: McCullochPitts Neuron Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 40.
    12/1 McCulloch (neuroscientist) andPitts (logi- cian) proposed a highly simplified computa- tional model of the neuron (1943) Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 41.
    12/1 x1 McCulloch (neuroscientist) andPitts (logi- cian) proposed a highly simplified computa- tional model of the neuron (1943) Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 42.
    12/1 x1 x2 McCulloch (neuroscientist)and Pitts (logi- cian) proposed a highly simplified computa- tional model of the neuron (1943) Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 43.
    12/1 x1 x2 .... McCulloch (neuroscientist) and Pitts (logi- cian) proposed a highly simplified computa- tional model of the neuron (1943) Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 44.
    12/1 x1 x2 .... xn McCulloch (neuroscientist) and Pitts (logi- cian) proposed a highly simplified computa- tional model of the neuron (1943) Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 45.
    12/1 x1 x2 .... xn ∈ {0, 1} McCulloch (neuroscientist) and Pitts (logi- cian) proposed a highly simplified computa- tional model of the neuron (1943) Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 46.
    12/1 x1 x2 .... xn ∈ {0, 1} g McCulloch (neuroscientist) and Pitts (logi- cian) proposed a highly simplified computa- tional model of the neuron (1943) g aggregates the inputs Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 47.
    12/1 x1 x2 .... xn ∈ {0, 1} g f McCulloch (neuroscientist) and Pitts (logi- cian) proposed a highly simplified computa- tional model of the neuron (1943) g aggregates the inputs and the function f takes a decision based on this aggregation Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 48.
    12/1 x1 x2 .... xn ∈ {0, 1} y ∈ {0, 1} g f McCulloch (neuroscientist) and Pitts (logi- cian) proposed a highly simplified computa- tional model of the neuron (1943) g aggregates the inputs and the function f takes a decision based on this aggregation Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 49.
    12/1 x1 x2 .... xn ∈ {0, 1} y ∈ {0, 1} g f McCulloch (neuroscientist) and Pitts (logi- cian) proposed a highly simplified computa- tional model of the neuron (1943) g aggregates the inputs and the function f takes a decision based on this aggregation The inputs can be excitatory or inhibitory Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 50.
    12/1 x1 x2 .... xn ∈ {0, 1} y ∈ {0, 1} g f McCulloch (neuroscientist) and Pitts (logi- cian) proposed a highly simplified computa- tional model of the neuron (1943) g aggregates the inputs and the function f takes a decision based on this aggregation The inputs can be excitatory or inhibitory y = 0 if any xi is inhibitory, else Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 51.
    12/1 x1 x2 .... xn ∈ {0, 1} y ∈ {0, 1} g f McCulloch (neuroscientist) and Pitts (logi- cian) proposed a highly simplified computa- tional model of the neuron (1943) g aggregates the inputs and the function f takes a decision based on this aggregation The inputs can be excitatory or inhibitory y = 0 if any xi is inhibitory, else g(x1, x2, ..., xn) = g(x) = n X i=1 xi Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 52.
    12/1 x1 x2 .... xn ∈ {0, 1} y ∈ {0, 1} g f McCulloch (neuroscientist) and Pitts (logi- cian) proposed a highly simplified computa- tional model of the neuron (1943) g aggregates the inputs and the function f takes a decision based on this aggregation The inputs can be excitatory or inhibitory y = 0 if any xi is inhibitory, else g(x1, x2, ..., xn) = g(x) = n X i=1 xi y = f(g(x)) = 1 if g(x) ≥ θ Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 53.
    12/1 x1 x2 .... xn ∈ {0, 1} y ∈ {0, 1} g f McCulloch (neuroscientist) and Pitts (logi- cian) proposed a highly simplified computa- tional model of the neuron (1943) g aggregates the inputs and the function f takes a decision based on this aggregation The inputs can be excitatory or inhibitory y = 0 if any xi is inhibitory, else g(x1, x2, ..., xn) = g(x) = n X i=1 xi y = f(g(x)) = 1 if g(x) ≥ θ = 0 if g(x) < θ Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 54.
    12/1 x1 x2 .... xn ∈ {0, 1} y ∈ {0, 1} g f McCulloch (neuroscientist) and Pitts (logi- cian) proposed a highly simplified computa- tional model of the neuron (1943) g aggregates the inputs and the function f takes a decision based on this aggregation The inputs can be excitatory or inhibitory y = 0 if any xi is inhibitory, else g(x1, x2, ..., xn) = g(x) = n X i=1 xi y = f(g(x)) = 1 if g(x) ≥ θ = 0 if g(x) < θ θ is called the thresholding parameter Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 55.
    12/1 x1 x2 .... xn ∈ {0, 1} y ∈ {0, 1} g f McCulloch (neuroscientist) and Pitts (logi- cian) proposed a highly simplified computa- tional model of the neuron (1943) g aggregates the inputs and the function f takes a decision based on this aggregation The inputs can be excitatory or inhibitory y = 0 if any xi is inhibitory, else g(x1, x2, ..., xn) = g(x) = n X i=1 xi y = f(g(x)) = 1 if g(x) ≥ θ = 0 if g(x) < θ θ is called the thresholding parameter This is called Thresholding Logic Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 56.
    13/1 Let us implementsome boolean functions using this McCulloch Pitts (MP) neuron ... Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 57.
    14/1 x1 x2 x3 y∈ {0, 1} θ A McCulloch Pitts unit Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 58.
    14/1 x1 x2 x3 y∈ {0, 1} θ A McCulloch Pitts unit x1 x2 x3 y ∈ {0, 1} AND function Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 59.
    14/1 x1 x2 x3 y∈ {0, 1} θ A McCulloch Pitts unit x1 x2 x3 y ∈ {0, 1} 3 AND function Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 60.
    14/1 x1 x2 x3 y∈ {0, 1} θ A McCulloch Pitts unit x1 x2 x3 y ∈ {0, 1} 3 AND function x1 x2 x3 y ∈ {0, 1} OR function Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 61.
    14/1 x1 x2 x3 y∈ {0, 1} θ A McCulloch Pitts unit x1 x2 x3 y ∈ {0, 1} 3 AND function x1 x2 x3 y ∈ {0, 1} 1 OR function Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 62.
    14/1 x1 x2 x3 y∈ {0, 1} θ A McCulloch Pitts unit x1 x2 y ∈ {0, 1} x1 AND !x2 ∗ ∗ circle at the end indicates inhibitory input: if any inhibitory input is 1 the output will be 0 x1 x2 x3 y ∈ {0, 1} 3 AND function x1 x2 x3 y ∈ {0, 1} 1 OR function Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 63.
    14/1 x1 x2 x3 y∈ {0, 1} θ A McCulloch Pitts unit x1 x2 y ∈ {0, 1} 1 x1 AND !x2 ∗ ∗ circle at the end indicates inhibitory input: if any inhibitory input is 1 the output will be 0 x1 x2 x3 y ∈ {0, 1} 3 AND function x1 x2 x3 y ∈ {0, 1} 1 OR function Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 64.
    14/1 x1 x2 x3 y∈ {0, 1} θ A McCulloch Pitts unit x1 x2 y ∈ {0, 1} 1 x1 AND !x2 ∗ ∗ circle at the end indicates inhibitory input: if any inhibitory input is 1 the output will be 0 x1 x2 x3 y ∈ {0, 1} 3 AND function x1 x2 y ∈ {0, 1} NOR function x1 x2 x3 y ∈ {0, 1} 1 OR function Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 65.
    14/1 x1 x2 x3 y∈ {0, 1} θ A McCulloch Pitts unit x1 x2 y ∈ {0, 1} 1 x1 AND !x2 ∗ ∗ circle at the end indicates inhibitory input: if any inhibitory input is 1 the output will be 0 x1 x2 x3 y ∈ {0, 1} 3 AND function x1 x2 y ∈ {0, 1} 0 NOR function x1 x2 x3 y ∈ {0, 1} 1 OR function Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 66.
    14/1 x1 x2 x3 y∈ {0, 1} θ A McCulloch Pitts unit x1 x2 y ∈ {0, 1} 1 x1 AND !x2 ∗ ∗ circle at the end indicates inhibitory input: if any inhibitory input is 1 the output will be 0 x1 x2 x3 y ∈ {0, 1} 3 AND function x1 x2 y ∈ {0, 1} 0 NOR function x1 x2 x3 y ∈ {0, 1} 1 OR function x1 y ∈ {0, 1} NOT function Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 67.
    14/1 x1 x2 x3 y∈ {0, 1} θ A McCulloch Pitts unit x1 x2 y ∈ {0, 1} 1 x1 AND !x2 ∗ ∗ circle at the end indicates inhibitory input: if any inhibitory input is 1 the output will be 0 x1 x2 x3 y ∈ {0, 1} 3 AND function x1 x2 y ∈ {0, 1} 0 NOR function x1 x2 x3 y ∈ {0, 1} 1 OR function x1 y ∈ {0, 1} 0 NOT function Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 68.
    15/1 Can any booleanfunction be represented using a McCulloch Pitts unit ? Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 69.
    15/1 Can any booleanfunction be represented using a McCulloch Pitts unit ? Before answering this question let us first see the geometric interpretation of a MP unit ... Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 70.
    16/1 x1 x2 y ∈{0, 1} 1 OR function x1 + x2 = P2 i=1 xi ≥ 1 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 71.
    16/1 x1 x2 y ∈{0, 1} 1 OR function x1 + x2 = P2 i=1 xi ≥ 1 x1 x2 (0, 0) (0, 1) (1, 0) (1, 1) Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 72.
    16/1 x1 x2 y ∈{0, 1} 1 OR function x1 + x2 = P2 i=1 xi ≥ 1 x1 x2 (0, 0) (0, 1) (1, 0) (1, 1) x1 + x2 = θ = 1 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 73.
    16/1 x1 x2 y ∈{0, 1} 1 OR function x1 + x2 = P2 i=1 xi ≥ 1 x1 x2 (0, 0) (0, 1) (1, 0) (1, 1) x1 + x2 = θ = 1 A single MP neuron splits the input points (4 points for 2 binary inputs) into two halves Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 74.
    16/1 x1 x2 y ∈{0, 1} 1 OR function x1 + x2 = P2 i=1 xi ≥ 1 x1 x2 (0, 0) (0, 1) (1, 0) (1, 1) x1 + x2 = θ = 1 A single MP neuron splits the input points (4 points for 2 binary inputs) into two halves Points lying on or above the line Pn i=1 xi−θ = 0 and points lying below this line Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 75.
    16/1 x1 x2 y ∈{0, 1} 1 OR function x1 + x2 = P2 i=1 xi ≥ 1 x1 x2 (0, 0) (0, 1) (1, 0) (1, 1) x1 + x2 = θ = 1 A single MP neuron splits the input points (4 points for 2 binary inputs) into two halves Points lying on or above the line Pn i=1 xi−θ = 0 and points lying below this line In other words, all inputs which produce an output 0 will be on one side ( Pn i=1 xi < θ) of the line and all inputs which produce an output 1 will lie on the other side ( Pn i=1 xi ≥ θ) of this line Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 76.
    16/1 x1 x2 y ∈{0, 1} 1 OR function x1 + x2 = P2 i=1 xi ≥ 1 x1 x2 (0, 0) (0, 1) (1, 0) (1, 1) x1 + x2 = θ = 1 A single MP neuron splits the input points (4 points for 2 binary inputs) into two halves Points lying on or above the line Pn i=1 xi−θ = 0 and points lying below this line In other words, all inputs which produce an output 0 will be on one side ( Pn i=1 xi < θ) of the line and all inputs which produce an output 1 will lie on the other side ( Pn i=1 xi ≥ θ) of this line Let us convince ourselves about this with a few more examples (if it is not already clear from the math) Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 77.
    17/1 x1 x2 y ∈{0, 1} 2 AND function x1 + x2 = P2 i=1 xi ≥ 2 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 78.
    17/1 x1 x2 y ∈{0, 1} 2 AND function x1 + x2 = P2 i=1 xi ≥ 2 x1 x2 (0, 0) (0, 1) (1, 0) (1, 1) Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 79.
    17/1 x1 x2 y ∈{0, 1} 2 AND function x1 + x2 = P2 i=1 xi ≥ 2 x1 x2 (0, 0) (0, 1) (1, 0) (1, 1) x1 + x2 = θ = 2 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 80.
    17/1 x1 x2 y ∈{0, 1} 2 AND function x1 + x2 = P2 i=1 xi ≥ 2 x1 x2 (0, 0) (0, 1) (1, 0) (1, 1) x1 + x2 = θ = 2 x1 x2 y ∈ {0, 1} Tautology (always ON) Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 81.
    17/1 x1 x2 y ∈{0, 1} 2 AND function x1 + x2 = P2 i=1 xi ≥ 2 x1 x2 (0, 0) (0, 1) (1, 0) (1, 1) x1 + x2 = θ = 2 x1 x2 y ∈ {0, 1} 0 Tautology (always ON) Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 82.
    17/1 x1 x2 y ∈{0, 1} 2 AND function x1 + x2 = P2 i=1 xi ≥ 2 x1 x2 (0, 0) (0, 1) (1, 0) (1, 1) x1 + x2 = θ = 2 x1 x2 y ∈ {0, 1} 0 Tautology (always ON) x1 x2 (0, 0) (0, 1) (1, 0) (1, 1) Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 83.
    17/1 x1 x2 y ∈{0, 1} 2 AND function x1 + x2 = P2 i=1 xi ≥ 2 x1 x2 (0, 0) (0, 1) (1, 0) (1, 1) x1 + x2 = θ = 2 x1 x2 y ∈ {0, 1} 0 Tautology (always ON) x1 x2 (0, 0) (0, 1) (1, 0) (1, 1) x1 + x2 = θ = 0 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 84.
    18/1 x1 x2 x3 y∈ {0, 1} OR 1 What if we have more than 2 inputs? Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 85.
    18/1 x1 x2 x3 y∈ {0, 1} OR 1 x1 x2 x3 (0, 0, 0) (0, 1, 0) (1, 0, 0) (1, 1, 0) (0, 0, 1) (1, 0, 1) (0, 1, 1) (1, 1, 1) What if we have more than 2 inputs? Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 86.
    18/1 x1 x2 x3 y∈ {0, 1} OR 1 x1 x2 x3 (0, 0, 0) (0, 1, 0) (1, 0, 0) (1, 1, 0) (0, 0, 1) (1, 0, 1) (0, 1, 1) (1, 1, 1) What if we have more than 2 inputs? Well, instead of a line we will have a plane Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 87.
    18/1 x1 x2 x3 y∈ {0, 1} OR 1 x1 x2 x3 (0, 0, 0) (0, 1, 0) (1, 0, 0) (1, 1, 0) (0, 0, 1) (1, 0, 1) (0, 1, 1) (1, 1, 1) What if we have more than 2 inputs? Well, instead of a line we will have a plane For the OR function, we want a plane such that the point (0,0,0) lies on one side and the remaining 7 points lie on the other side of the plane Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 88.
    18/1 x1 x2 x3 y∈ {0, 1} OR 1 x1 x2 x3 (0, 0, 0) (0, 1, 0) (1, 0, 0) (1, 1, 0) (0, 0, 1) (1, 0, 1) (0, 1, 1) (1, 1, 1)x1 + x2 + x3 = θ = 1 What if we have more than 2 inputs? Well, instead of a line we will have a plane For the OR function, we want a plane such that the point (0,0,0) lies on one side and the remaining 7 points lie on the other side of the plane Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 89.
    19/1 The story sofar ... A single McCulloch Pitts Neuron can be used to represent boolean functions which are linearly separable Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 90.
    19/1 The story sofar ... A single McCulloch Pitts Neuron can be used to represent boolean functions which are linearly separable Linear separability (for boolean functions) : There exists a line (plane) such that all inputs which produce a 1 lie on one side of the line (plane) and all inputs which produce a 0 lie on other side of the line (plane) Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 91.
    20/1 Module 2.3: Perceptron MiteshM. Khapra CS7015 (Deep Learning) : Lecture 2
  • 92.
    21/1 The story ahead... What about non-boolean (say, real) inputs ? Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 93.
    21/1 The story ahead... What about non-boolean (say, real) inputs ? Do we always need to hand code the threshold ? Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 94.
    21/1 The story ahead... What about non-boolean (say, real) inputs ? Do we always need to hand code the threshold ? Are all inputs equal ? What if we want to assign more weight (importance) to some inputs ? Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 95.
    21/1 The story ahead... What about non-boolean (say, real) inputs ? Do we always need to hand code the threshold ? Are all inputs equal ? What if we want to assign more weight (importance) to some inputs ? What about functions which are not linearly separable ? Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 96.
    22/1 Frank Rosenblatt, anAmerican psychologist, proposed the classical perceptron model (1958) Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 97.
    22/1 x1 x2 .... xn y w1 w2 .. .. wn Frank Rosenblatt, an American psychologist, proposed the classical perceptron model (1958) Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 98.
    22/1 x1 x2 .... xn y w1 w2 .. .. wn Frank Rosenblatt, an American psychologist, proposed the classical perceptron model (1958) A more general computational model than McCulloch–Pitts neurons Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 99.
    22/1 x1 x2 .... xn y w1 w2 .. .. wn Frank Rosenblatt, an American psychologist, proposed the classical perceptron model (1958) A more general computational model than McCulloch–Pitts neurons Main differences: Introduction of numer- ical weights for inputs and a mechanism for learning these weights Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 100.
    22/1 x1 x2 .... xn y w1 w2 .. .. wn Frank Rosenblatt, an American psychologist, proposed the classical perceptron model (1958) A more general computational model than McCulloch–Pitts neurons Main differences: Introduction of numer- ical weights for inputs and a mechanism for learning these weights Inputs are no longer limited to boolean values Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 101.
    22/1 x1 x2 .... xn y w1 w2 .. .. wn Frank Rosenblatt, an American psychologist, proposed the classical perceptron model (1958) A more general computational model than McCulloch–Pitts neurons Main differences: Introduction of numer- ical weights for inputs and a mechanism for learning these weights Inputs are no longer limited to boolean values Refined and carefully analyzed by Minsky and Papert (1969) - their model is referred to as the perceptron model here Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 102.
    23/1 x1 x2 .... xn y w1 w2 .. .. wn Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 103.
    23/1 x1 x2 .... xn y w1 w2 .. .. wn y = 1 if n X i=1 wi ∗ xi ≥ θ Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 104.
    23/1 x1 x2 .... xn y w1 w2 .. .. wn y = 1 if n X i=1 wi ∗ xi ≥ θ = 0 if n X i=1 wi ∗ xi < θ Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 105.
    23/1 x1 x2 .... xn y w1 w2 .. .. wn y = 1 if n X i=1 wi ∗ xi ≥ θ = 0 if n X i=1 wi ∗ xi < θ Rewriting the above, Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 106.
    23/1 x1 x2 .... xn y w1 w2 .. .. wn y = 1 if n X i=1 wi ∗ xi ≥ θ = 0 if n X i=1 wi ∗ xi < θ Rewriting the above, y = 1 if n X i=1 wi ∗ xi − θ ≥ 0 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 107.
    23/1 x1 x2 .... xn y w1 w2 .. .. wn y = 1 if n X i=1 wi ∗ xi ≥ θ = 0 if n X i=1 wi ∗ xi < θ Rewriting the above, y = 1 if n X i=1 wi ∗ xi − θ ≥ 0 = 0 if n X i=1 wi ∗ xi − θ < 0 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 108.
    23/1 x1 x2 .... xn y w1 w2 .. .. wn A more accepted convention, y = 1 if n X i=1 wi ∗ xi ≥ θ = 0 if n X i=1 wi ∗ xi < θ Rewriting the above, y = 1 if n X i=1 wi ∗ xi − θ ≥ 0 = 0 if n X i=1 wi ∗ xi − θ < 0 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 109.
    23/1 x1 x2 .... xn y w1 w2 .. .. wn A more accepted convention, y = 1 if n X i=0 wi ∗ xi ≥ 0 y = 1 if n X i=1 wi ∗ xi ≥ θ = 0 if n X i=1 wi ∗ xi < θ Rewriting the above, y = 1 if n X i=1 wi ∗ xi − θ ≥ 0 = 0 if n X i=1 wi ∗ xi − θ < 0 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 110.
    23/1 x1 x2 .... xn y w1 w2 .. .. wn A more accepted convention, y = 1 if n X i=0 wi ∗ xi ≥ 0 where, x0 = 1 and w0 = −θ y = 1 if n X i=1 wi ∗ xi ≥ θ = 0 if n X i=1 wi ∗ xi < θ Rewriting the above, y = 1 if n X i=1 wi ∗ xi − θ ≥ 0 = 0 if n X i=1 wi ∗ xi − θ < 0 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 111.
    23/1 x1 x2 .... xn x0 = 1 y w1 w2 .. .. wn w0 = −θ A more accepted convention, y = 1 if n X i=0 wi ∗ xi ≥ 0 where, x0 = 1 and w0 = −θ y = 1 if n X i=1 wi ∗ xi ≥ θ = 0 if n X i=1 wi ∗ xi < θ Rewriting the above, y = 1 if n X i=1 wi ∗ xi − θ ≥ 0 = 0 if n X i=1 wi ∗ xi − θ < 0 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 112.
    23/1 x1 x2 .... xn x0 = 1 y w1 w2 .. .. wn w0 = −θ A more accepted convention, y = 1 if n X i=0 wi ∗ xi ≥ 0 = 0 if n X i=0 wi ∗ xi < 0 where, x0 = 1 and w0 = −θ y = 1 if n X i=1 wi ∗ xi ≥ θ = 0 if n X i=1 wi ∗ xi < θ Rewriting the above, y = 1 if n X i=1 wi ∗ xi − θ ≥ 0 = 0 if n X i=1 wi ∗ xi − θ < 0 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 113.
    24/1 We will nowtry to answer the following questions: Why are we trying to implement boolean functions? Why do we need weights ? Why is w0 = −θ called the bias ? Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 114.
    25/1 x0 = 1x1 x2 x3 y w0 = −θ w1 w2 w3 Consider the task of predicting whether we would like a movie or not Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 115.
    25/1 x0 = 1x1 x2 x3 y w0 = −θ w1 w2 w3 Consider the task of predicting whether we would like a movie or not Suppose, we base our decision on 3 inputs (binary, for simplicity) Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 116.
    25/1 x0 = 1x1 x2 x3 y w0 = −θ w1 w2 w3 x1 = isActorDamon x2 = isGenreThriller x3 = isDirectorNolan Consider the task of predicting whether we would like a movie or not Suppose, we base our decision on 3 inputs (binary, for simplicity) Based on our past viewing experience (data), we may give a high weight to isDirectorNolan as compared to the other inputs Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 117.
    25/1 x0 = 1x1 x2 x3 y w0 = −θ w1 w2 w3 x1 = isActorDamon x2 = isGenreThriller x3 = isDirectorNolan Consider the task of predicting whether we would like a movie or not Suppose, we base our decision on 3 inputs (binary, for simplicity) Based on our past viewing experience (data), we may give a high weight to isDirectorNolan as compared to the other inputs Specifically, even if the actor is not Matt Damon and the genre is not thriller we would still want to cross the threshold θ by assigning a high weight to isDirect- orNolan Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 118.
    25/1 x0 = 1x1 x2 x3 y w0 = −θ w1 w2 w3 x1 = isActorDamon x2 = isGenreThriller x3 = isDirectorNolan w0 is called the bias as it represents the prior (preju- dice) Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 119.
    25/1 x0 = 1x1 x2 x3 y w0 = −θ w1 w2 w3 x1 = isActorDamon x2 = isGenreThriller x3 = isDirectorNolan w0 is called the bias as it represents the prior (preju- dice) A movie buff may have a very low threshold and may watch any movie irrespective of the genre, actor, dir- ector [θ = 0] Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 120.
    25/1 x0 = 1x1 x2 x3 y w0 = −θ w1 w2 w3 x1 = isActorDamon x2 = isGenreThriller x3 = isDirectorNolan w0 is called the bias as it represents the prior (preju- dice) A movie buff may have a very low threshold and may watch any movie irrespective of the genre, actor, dir- ector [θ = 0] On the other hand, a selective viewer may only watch thrillers starring Matt Damon and directed by Nolan [θ = 3] Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 121.
    25/1 x0 = 1x1 x2 x3 y w0 = −θ w1 w2 w3 x1 = isActorDamon x2 = isGenreThriller x3 = isDirectorNolan w0 is called the bias as it represents the prior (preju- dice) A movie buff may have a very low threshold and may watch any movie irrespective of the genre, actor, dir- ector [θ = 0] On the other hand, a selective viewer may only watch thrillers starring Matt Damon and directed by Nolan [θ = 3] The weights (w1, w2, ..., wn) and the bias (w0) will de- pend on the data (viewer history in this case) Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 122.
    26/1 What kind offunctions can be implemented using the perceptron? Any difference from McCulloch Pitts neurons? Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 123.
    27/1 McCulloch Pitts Neuron (assumingno inhibitory inputs) y = 1 if n X i=0 xi ≥ 0 = 0 if n X i=0 xi < 0 Perceptron y = 1 if n X i=0 wi ∗ xi ≥ 0 = 0 if n X i=0 wi ∗ xi < 0 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 124.
    27/1 McCulloch Pitts Neuron (assumingno inhibitory inputs) y = 1 if n X i=0 xi ≥ 0 = 0 if n X i=0 xi < 0 Perceptron y = 1 if n X i=0 wi ∗ xi ≥ 0 = 0 if n X i=0 wi ∗ xi < 0 From the equations it should be clear that even a perceptron separates the input space into two halves Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 125.
    27/1 McCulloch Pitts Neuron (assumingno inhibitory inputs) y = 1 if n X i=0 xi ≥ 0 = 0 if n X i=0 xi < 0 Perceptron y = 1 if n X i=0 wi ∗ xi ≥ 0 = 0 if n X i=0 wi ∗ xi < 0 From the equations it should be clear that even a perceptron separates the input space into two halves All inputs which produce a 1 lie on one side and all inputs which produce a 0 lie on the other side Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 126.
    27/1 McCulloch Pitts Neuron (assumingno inhibitory inputs) y = 1 if n X i=0 xi ≥ 0 = 0 if n X i=0 xi < 0 Perceptron y = 1 if n X i=0 wi ∗ xi ≥ 0 = 0 if n X i=0 wi ∗ xi < 0 From the equations it should be clear that even a perceptron separates the input space into two halves All inputs which produce a 1 lie on one side and all inputs which produce a 0 lie on the other side In other words, a single perceptron can only be used to implement linearly separable func- tions Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 127.
    27/1 McCulloch Pitts Neuron (assumingno inhibitory inputs) y = 1 if n X i=0 xi ≥ 0 = 0 if n X i=0 xi < 0 Perceptron y = 1 if n X i=0 wi ∗ xi ≥ 0 = 0 if n X i=0 wi ∗ xi < 0 From the equations it should be clear that even a perceptron separates the input space into two halves All inputs which produce a 1 lie on one side and all inputs which produce a 0 lie on the other side In other words, a single perceptron can only be used to implement linearly separable func- tions Then what is the difference? Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 128.
    27/1 McCulloch Pitts Neuron (assumingno inhibitory inputs) y = 1 if n X i=0 xi ≥ 0 = 0 if n X i=0 xi < 0 Perceptron y = 1 if n X i=0 wi ∗ xi ≥ 0 = 0 if n X i=0 wi ∗ xi < 0 From the equations it should be clear that even a perceptron separates the input space into two halves All inputs which produce a 1 lie on one side and all inputs which produce a 0 lie on the other side In other words, a single perceptron can only be used to implement linearly separable func- tions Then what is the difference? The weights (in- cluding threshold) can be learned and the in- puts can be real valued Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 129.
    27/1 McCulloch Pitts Neuron (assumingno inhibitory inputs) y = 1 if n X i=0 xi ≥ 0 = 0 if n X i=0 xi < 0 Perceptron y = 1 if n X i=0 wi ∗ xi ≥ 0 = 0 if n X i=0 wi ∗ xi < 0 From the equations it should be clear that even a perceptron separates the input space into two halves All inputs which produce a 1 lie on one side and all inputs which produce a 0 lie on the other side In other words, a single perceptron can only be used to implement linearly separable func- tions Then what is the difference? The weights (in- cluding threshold) can be learned and the in- puts can be real valued We will first revisit some boolean functions and then see the perceptron learning al- gorithm (for learning weights) Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 130.
    28/1 x1 x2 OR 00 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 131.
    28/1 x1 x2 OR 00 0 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 132.
    28/1 x1 x2 OR 00 0 w0 + P2 i=1 wixi Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 133.
    28/1 x1 x2 OR 00 0 w0 + P2 i=1 wixi < 0 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 134.
    28/1 x1 x2 OR 00 0 w0 + P2 i=1 wixi < 0 1 0 1 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 135.
    28/1 x1 x2 OR 00 0 w0 + P2 i=1 wixi < 0 1 0 1 w0 + P2 i=1 wixi ≥ 0 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 136.
    28/1 x1 x2 OR 00 0 w0 + P2 i=1 wixi < 0 1 0 1 w0 + P2 i=1 wixi ≥ 0 0 1 1 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 137.
    28/1 x1 x2 OR 00 0 w0 + P2 i=1 wixi < 0 1 0 1 w0 + P2 i=1 wixi ≥ 0 0 1 1 w0 + P2 i=1 wixi ≥ 0 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 138.
    28/1 x1 x2 OR 00 0 w0 + P2 i=1 wixi < 0 1 0 1 w0 + P2 i=1 wixi ≥ 0 0 1 1 w0 + P2 i=1 wixi ≥ 0 1 1 1 w0 + P2 i=1 wixi ≥ 0 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 139.
    28/1 x1 x2 OR 00 0 w0 + P2 i=1 wixi < 0 1 0 1 w0 + P2 i=1 wixi ≥ 0 0 1 1 w0 + P2 i=1 wixi ≥ 0 1 1 1 w0 + P2 i=1 wixi ≥ 0 w0 + w1 · 0 + w2 · 0 < 0 =⇒ w0 < 0 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 140.
    28/1 x1 x2 OR 00 0 w0 + P2 i=1 wixi < 0 1 0 1 w0 + P2 i=1 wixi ≥ 0 0 1 1 w0 + P2 i=1 wixi ≥ 0 1 1 1 w0 + P2 i=1 wixi ≥ 0 w0 + w1 · 0 + w2 · 0 < 0 =⇒ w0 < 0 w0 + w1 · 0 + w2 · 1 ≥ 0 =⇒ w2 ≥ −w0 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 141.
    28/1 x1 x2 OR 00 0 w0 + P2 i=1 wixi < 0 1 0 1 w0 + P2 i=1 wixi ≥ 0 0 1 1 w0 + P2 i=1 wixi ≥ 0 1 1 1 w0 + P2 i=1 wixi ≥ 0 w0 + w1 · 0 + w2 · 0 < 0 =⇒ w0 < 0 w0 + w1 · 0 + w2 · 1 ≥ 0 =⇒ w2 ≥ −w0 w0 + w1 · 1 + w2 · 0 ≥ 0 =⇒ w1 ≥ −w0 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 142.
    28/1 x1 x2 OR 00 0 w0 + P2 i=1 wixi < 0 1 0 1 w0 + P2 i=1 wixi ≥ 0 0 1 1 w0 + P2 i=1 wixi ≥ 0 1 1 1 w0 + P2 i=1 wixi ≥ 0 w0 + w1 · 0 + w2 · 0 < 0 =⇒ w0 < 0 w0 + w1 · 0 + w2 · 1 ≥ 0 =⇒ w2 ≥ −w0 w0 + w1 · 1 + w2 · 0 ≥ 0 =⇒ w1 ≥ −w0 w0 + w1 · 1 + w2 · 1 ≥ 0 =⇒ w1 + w2 ≥ −w0 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 143.
    28/1 x1 x2 OR 00 0 w0 + P2 i=1 wixi < 0 1 0 1 w0 + P2 i=1 wixi ≥ 0 0 1 1 w0 + P2 i=1 wixi ≥ 0 1 1 1 w0 + P2 i=1 wixi ≥ 0 w0 + w1 · 0 + w2 · 0 < 0 =⇒ w0 < 0 w0 + w1 · 0 + w2 · 1 ≥ 0 =⇒ w2 ≥ −w0 w0 + w1 · 1 + w2 · 0 ≥ 0 =⇒ w1 ≥ −w0 w0 + w1 · 1 + w2 · 1 ≥ 0 =⇒ w1 + w2 ≥ −w0 One possible solution to this set of inequalities is w0 = −1, w1 = 1.1, , w2 = 1.1 (and various other solutions are possible) Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 144.
    28/1 x1 x2 OR 00 0 w0 + P2 i=1 wixi < 0 1 0 1 w0 + P2 i=1 wixi ≥ 0 0 1 1 w0 + P2 i=1 wixi ≥ 0 1 1 1 w0 + P2 i=1 wixi ≥ 0 w0 + w1 · 0 + w2 · 0 < 0 =⇒ w0 < 0 w0 + w1 · 0 + w2 · 1 ≥ 0 =⇒ w2 ≥ −w0 w0 + w1 · 1 + w2 · 0 ≥ 0 =⇒ w1 ≥ −w0 w0 + w1 · 1 + w2 · 1 ≥ 0 =⇒ w1 + w2 ≥ −w0 One possible solution to this set of inequalities is w0 = −1, w1 = 1.1, , w2 = 1.1 (and various other solutions are possible) x1 x2 (0, 0) (0, 1) (1, 0) (1, 1) Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 145.
    28/1 x1 x2 OR 00 0 w0 + P2 i=1 wixi < 0 1 0 1 w0 + P2 i=1 wixi ≥ 0 0 1 1 w0 + P2 i=1 wixi ≥ 0 1 1 1 w0 + P2 i=1 wixi ≥ 0 w0 + w1 · 0 + w2 · 0 < 0 =⇒ w0 < 0 w0 + w1 · 0 + w2 · 1 ≥ 0 =⇒ w2 ≥ −w0 w0 + w1 · 1 + w2 · 0 ≥ 0 =⇒ w1 ≥ −w0 w0 + w1 · 1 + w2 · 1 ≥ 0 =⇒ w1 + w2 ≥ −w0 One possible solution to this set of inequalities is w0 = −1, w1 = 1.1, , w2 = 1.1 (and various other solutions are possible) x1 x2 (0, 0) (0, 1) (1, 0) (1, 1) −1 + 1.1x1 + 1.1x2 = 0 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 146.
    28/1 x1 x2 OR 00 0 w0 + P2 i=1 wixi < 0 1 0 1 w0 + P2 i=1 wixi ≥ 0 0 1 1 w0 + P2 i=1 wixi ≥ 0 1 1 1 w0 + P2 i=1 wixi ≥ 0 w0 + w1 · 0 + w2 · 0 < 0 =⇒ w0 < 0 w0 + w1 · 0 + w2 · 1 ≥ 0 =⇒ w2 ≥ −w0 w0 + w1 · 1 + w2 · 0 ≥ 0 =⇒ w1 ≥ −w0 w0 + w1 · 1 + w2 · 1 ≥ 0 =⇒ w1 + w2 ≥ −w0 One possible solution to this set of inequalities is w0 = −1, w1 = 1.1, , w2 = 1.1 (and various other solutions are possible) x1 x2 (0, 0) (0, 1) (1, 0) (1, 1) −1 + 1.1x1 + 1.1x2 = 0 Note that we can come up with a similar set of inequal- ities and find the value of θ for a McCulloch Pitts neuron also Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 147.
    28/1 x1 x2 OR 00 0 w0 + P2 i=1 wixi < 0 1 0 1 w0 + P2 i=1 wixi ≥ 0 0 1 1 w0 + P2 i=1 wixi ≥ 0 1 1 1 w0 + P2 i=1 wixi ≥ 0 w0 + w1 · 0 + w2 · 0 < 0 =⇒ w0 < 0 w0 + w1 · 0 + w2 · 1 ≥ 0 =⇒ w2 ≥ −w0 w0 + w1 · 1 + w2 · 0 ≥ 0 =⇒ w1 ≥ −w0 w0 + w1 · 1 + w2 · 1 ≥ 0 =⇒ w1 + w2 ≥ −w0 One possible solution to this set of inequalities is w0 = −1, w1 = 1.1, , w2 = 1.1 (and various other solutions are possible) x1 x2 (0, 0) (0, 1) (1, 0) (1, 1) −1 + 1.1x1 + 1.1x2 = 0 Note that we can come up with a similar set of inequal- ities and find the value of θ for a McCulloch Pitts neuron also (Try it!) Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 148.
    29/1 Module 2.4: Errorsand Error Surfaces Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 149.
    30/1 Let us fixthe threshold (−w0 = 1) and try different values of w1, w2 x1 x2 (0, 0) (0, 1) (1, 0) (1, 1) −1 + 1.1x1 + 1.1x2 = 0 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 150.
    30/1 Let us fixthe threshold (−w0 = 1) and try different values of w1, w2 Say, w1 = −1, w2 = −1 x1 x2 (0, 0) (0, 1) (1, 0) (1, 1) −1 + 1.1x1 + 1.1x2 = 0 −1 + (−1)x1 + (−1)x2 = 0 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 151.
    30/1 Let us fixthe threshold (−w0 = 1) and try different values of w1, w2 Say, w1 = −1, w2 = −1 What is wrong with this line? x1 x2 (0, 0) (0, 1) (1, 0) (1, 1) −1 + 1.1x1 + 1.1x2 = 0 −1 + (−1)x1 + (−1)x2 = 0 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 152.
    30/1 Let us fixthe threshold (−w0 = 1) and try different values of w1, w2 Say, w1 = −1, w2 = −1 What is wrong with this line? We make an error on 1 out of the 4 inputs x1 x2 (0, 0) (0, 1) (1, 0) (1, 1) −1 + 1.1x1 + 1.1x2 = 0 −1 + (−1)x1 + (−1)x2 = 0 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 153.
    30/1 Let us fixthe threshold (−w0 = 1) and try different values of w1, w2 Say, w1 = −1, w2 = −1 What is wrong with this line? We make an error on 1 out of the 4 inputs Lets try some more values of w1, w2 and note how many errors we make x1 x2 (0, 0) (0, 1) (1, 0) (1, 1) −1 + 1.1x1 + 1.1x2 = 0 −1 + (−1)x1 + (−1)x2 = 0 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 154.
    30/1 Let us fixthe threshold (−w0 = 1) and try different values of w1, w2 Say, w1 = −1, w2 = −1 What is wrong with this line? We make an error on 1 out of the 4 inputs Lets try some more values of w1, w2 and note how many errors we make w1 w2 errors -1 -1 3 x1 x2 (0, 0) (0, 1) (1, 0) (1, 1) −1 + 1.1x1 + 1.1x2 = 0 −1 + (−1)x1 + (−1)x2 = 0 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 155.
    30/1 Let us fixthe threshold (−w0 = 1) and try different values of w1, w2 Say, w1 = −1, w2 = −1 What is wrong with this line? We make an error on 1 out of the 4 inputs Lets try some more values of w1, w2 and note how many errors we make w1 w2 errors -1 -1 3 1.5 0 1 x1 x2 (0, 0) (0, 1) (1, 0) (1, 1) −1 + 1.1x1 + 1.1x2 = 0 −1 + (−1)x1 + (−1)x2 = 0 −1 + (1.5)x1 + (0)x2 = 0 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 156.
    30/1 Let us fixthe threshold (−w0 = 1) and try different values of w1, w2 Say, w1 = −1, w2 = −1 What is wrong with this line? We make an error on 1 out of the 4 inputs Lets try some more values of w1, w2 and note how many errors we make w1 w2 errors -1 -1 3 1.5 0 1 0.45 0.45 3 x1 x2 (0, 0) (0, 1) (1, 0) (1, 1) −1 + 1.1x1 + 1.1x2 = 0 −1 + (−1)x1 + (−1)x2 = 0 −1 + (1.5)x1 + (0)x2 = 0 −1 + (0.45)x1 + (0.45)x2 = 0 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 157.
    30/1 Let us fixthe threshold (−w0 = 1) and try different values of w1, w2 Say, w1 = −1, w2 = −1 What is wrong with this line? We make an error on 1 out of the 4 inputs Lets try some more values of w1, w2 and note how many errors we make w1 w2 errors -1 -1 3 1.5 0 1 0.45 0.45 3 We are interested in those values of w0, w1, w2 which result in 0 error x1 x2 (0, 0) (0, 1) (1, 0) (1, 1) −1 + 1.1x1 + 1.1x2 = 0 −1 + (−1)x1 + (−1)x2 = 0 −1 + (1.5)x1 + (0)x2 = 0 −1 + (0.45)x1 + (0.45)x2 = 0 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 158.
    30/1 Let us fixthe threshold (−w0 = 1) and try different values of w1, w2 Say, w1 = −1, w2 = −1 What is wrong with this line? We make an error on 1 out of the 4 inputs Lets try some more values of w1, w2 and note how many errors we make w1 w2 errors -1 -1 3 1.5 0 1 0.45 0.45 3 We are interested in those values of w0, w1, w2 which result in 0 error Let us plot the error surface corresponding to different values of w0, w1, w2 x1 x2 (0, 0) (0, 1) (1, 0) (1, 1) −1 + 1.1x1 + 1.1x2 = 0 −1 + (−1)x1 + (−1)x2 = 0 −1 + (1.5)x1 + (0)x2 = 0 −1 + (0.45)x1 + (0.45)x2 = 0 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 159.
    31/1 For ease ofanalysis, we will keep w0 fixed (-1) and plot the error for dif- ferent values of w1, w2 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 160.
    31/1 For ease ofanalysis, we will keep w0 fixed (-1) and plot the error for dif- ferent values of w1, w2 For a given w0, w1, w2 we will com- pute −w0+w1∗x1+w2∗x2 for all com- binations of (x1, x2) and note down how many errors we make Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 161.
    31/1 For ease ofanalysis, we will keep w0 fixed (-1) and plot the error for dif- ferent values of w1, w2 For a given w0, w1, w2 we will com- pute −w0+w1∗x1+w2∗x2 for all com- binations of (x1, x2) and note down how many errors we make For the OR function, an error occurs if (x1, x2) = (0, 0) but −w0 +w1 ∗x1 + w2 ∗ x2 ≥ 0 or if (x1, x2) 6= (0, 0) but −w0 + w1 ∗ x1 + w2 ∗ x2 < 0 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 162.
    31/1 For ease ofanalysis, we will keep w0 fixed (-1) and plot the error for dif- ferent values of w1, w2 For a given w0, w1, w2 we will com- pute −w0+w1∗x1+w2∗x2 for all com- binations of (x1, x2) and note down how many errors we make For the OR function, an error occurs if (x1, x2) = (0, 0) but −w0 +w1 ∗x1 + w2 ∗ x2 ≥ 0 or if (x1, x2) 6= (0, 0) but −w0 + w1 ∗ x1 + w2 ∗ x2 < 0 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 163.
    31/1 For ease ofanalysis, we will keep w0 fixed (-1) and plot the error for dif- ferent values of w1, w2 For a given w0, w1, w2 we will com- pute −w0+w1∗x1+w2∗x2 for all com- binations of (x1, x2) and note down how many errors we make For the OR function, an error occurs if (x1, x2) = (0, 0) but −w0 +w1 ∗x1 + w2 ∗ x2 ≥ 0 or if (x1, x2) 6= (0, 0) but −w0 + w1 ∗ x1 + w2 ∗ x2 < 0 We are interested in finding an al- gorithm which finds the values of w1, w2 which minimize this error Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 164.
    32/1 Module 2.5: PerceptronLearning Algorithm Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 165.
    33/1 We will nowsee a more principled approach for learning these weights and threshold but before that let us answer this question... Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 166.
    33/1 We will nowsee a more principled approach for learning these weights and threshold but before that let us answer this question... Apart from implementing boolean functions (which does not look very interest- ing) what can a perceptron be used for ? Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 167.
    33/1 We will nowsee a more principled approach for learning these weights and threshold but before that let us answer this question... Apart from implementing boolean functions (which does not look very interest- ing) what can a perceptron be used for ? Our interest lies in the use of perceptron as a binary classifier. Let us see what this means... Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 168.
    34/1 Let us reconsiderour problem of deciding whether to watch a movie or not Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 169.
    34/1 Let us reconsiderour problem of deciding whether to watch a movie or not Suppose we are given a list of m movies and a label (class) associated with each movie in- dicating whether the user liked this movie or not : binary decision Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 170.
    34/1 Let us reconsiderour problem of deciding whether to watch a movie or not Suppose we are given a list of m movies and a label (class) associated with each movie in- dicating whether the user liked this movie or not : binary decision Further, suppose we represent each movie with n features (some boolean, some real val- ued) Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 171.
    34/1 x1 = isActorDamon x2= isGenreThriller x3 = isDirectorNolan x4 = imdbRating(scaled to 0 to 1) ... ... xn = criticsRating(scaled to 0 to 1) Let us reconsider our problem of deciding whether to watch a movie or not Suppose we are given a list of m movies and a label (class) associated with each movie in- dicating whether the user liked this movie or not : binary decision Further, suppose we represent each movie with n features (some boolean, some real val- ued) Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 172.
    34/1 x0 = 1x1 x2 .. .. xn y x1 = isActorDamon x2 = isGenreThriller x3 = isDirectorNolan x4 = imdbRating(scaled to 0 to 1) ... ... xn = criticsRating(scaled to 0 to 1) Let us reconsider our problem of deciding whether to watch a movie or not Suppose we are given a list of m movies and a label (class) associated with each movie in- dicating whether the user liked this movie or not : binary decision Further, suppose we represent each movie with n features (some boolean, some real val- ued) We will assume that the data is linearly sep- arable and we want a perceptron to learn how to make this decision Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 173.
    34/1 x0 = 1x1 x2 .. .. xn y w0 = −θ w1 w2 .. .. wn x1 = isActorDamon x2 = isGenreThriller x3 = isDirectorNolan x4 = imdbRating(scaled to 0 to 1) ... ... xn = criticsRating(scaled to 0 to 1) Let us reconsider our problem of deciding whether to watch a movie or not Suppose we are given a list of m movies and a label (class) associated with each movie in- dicating whether the user liked this movie or not : binary decision Further, suppose we represent each movie with n features (some boolean, some real val- ued) We will assume that the data is linearly sep- arable and we want a perceptron to learn how to make this decision In other words, we want the perceptron to find the equation of this separating plane (or find the values of w0, w1, w2, .., wm) Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 174.
    35/1 Algorithm: Perceptron LearningAlgorithm Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 175.
    35/1 Algorithm: Perceptron LearningAlgorithm P ← inputs with label 1; Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 176.
    35/1 Algorithm: Perceptron LearningAlgorithm P ← inputs with label 1; N ← inputs with label 0; Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 177.
    35/1 Algorithm: Perceptron LearningAlgorithm P ← inputs with label 1; N ← inputs with label 0; Initialize w randomly; Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 178.
    35/1 Algorithm: Perceptron LearningAlgorithm P ← inputs with label 1; N ← inputs with label 0; Initialize w randomly; while !convergence do end Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 179.
    35/1 Algorithm: Perceptron LearningAlgorithm P ← inputs with label 1; N ← inputs with label 0; Initialize w randomly; while !convergence do end //the algorithm converges when all the inputs are classified correctly Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 180.
    35/1 Algorithm: Perceptron LearningAlgorithm P ← inputs with label 1; N ← inputs with label 0; Initialize w randomly; while !convergence do Pick random x ∈ P ∪ N ; end //the algorithm converges when all the inputs are classified correctly Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 181.
    35/1 Algorithm: Perceptron LearningAlgorithm P ← inputs with label 1; N ← inputs with label 0; Initialize w randomly; while !convergence do Pick random x ∈ P ∪ N ; if x ∈ P and Pn i=0 wi ∗ xi < 0 then end end //the algorithm converges when all the inputs are classified correctly Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 182.
    35/1 Algorithm: Perceptron LearningAlgorithm P ← inputs with label 1; N ← inputs with label 0; Initialize w randomly; while !convergence do Pick random x ∈ P ∪ N ; if x ∈ P and Pn i=0 wi ∗ xi < 0 then w = w + x ; end end //the algorithm converges when all the inputs are classified correctly Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 183.
    35/1 Algorithm: Perceptron LearningAlgorithm P ← inputs with label 1; N ← inputs with label 0; Initialize w randomly; while !convergence do Pick random x ∈ P ∪ N ; if x ∈ P and Pn i=0 wi ∗ xi < 0 then w = w + x ; end if x ∈ N and Pn i=0 wi ∗ xi ≥ 0 then end end //the algorithm converges when all the inputs are classified correctly Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 184.
    35/1 Algorithm: Perceptron LearningAlgorithm P ← inputs with label 1; N ← inputs with label 0; Initialize w randomly; while !convergence do Pick random x ∈ P ∪ N ; if x ∈ P and Pn i=0 wi ∗ xi < 0 then w = w + x ; end if x ∈ N and Pn i=0 wi ∗ xi ≥ 0 then w = w − x ; end end //the algorithm converges when all the inputs are classified correctly Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 185.
    35/1 Algorithm: Perceptron LearningAlgorithm P ← inputs with label 1; N ← inputs with label 0; Initialize w randomly; while !convergence do Pick random x ∈ P ∪ N ; if x ∈ P and Pn i=0 wi ∗ xi < 0 then w = w + x ; end if x ∈ N and Pn i=0 wi ∗ xi ≥ 0 then w = w − x ; end end //the algorithm converges when all the inputs are classified correctly Why would this work ? Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 186.
    35/1 Algorithm: Perceptron LearningAlgorithm P ← inputs with label 1; N ← inputs with label 0; Initialize w randomly; while !convergence do Pick random x ∈ P ∪ N ; if x ∈ P and Pn i=0 wi ∗ xi < 0 then w = w + x ; end if x ∈ N and Pn i=0 wi ∗ xi ≥ 0 then w = w − x ; end end //the algorithm converges when all the inputs are classified correctly Why would this work ? To understand why this works we will have to get into a bit of Linear Algebra and a bit of geo- metry... Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 187.
    36/1 Consider two vectorsw and x Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 188.
    36/1 Consider two vectorsw and x w = [w0, w1, w2, ..., wn] x = [1, x1, x2, ..., xn] Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 189.
    36/1 Consider two vectorsw and x w = [w0, w1, w2, ..., wn] x = [1, x1, x2, ..., xn] w · x = wT x = n X i=0 wi ∗ xi Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 190.
    36/1 Consider two vectorsw and x w = [w0, w1, w2, ..., wn] x = [1, x1, x2, ..., xn] w · x = wT x = n X i=0 wi ∗ xi We can thus rewrite the perceptron rule as Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 191.
    36/1 Consider two vectorsw and x w = [w0, w1, w2, ..., wn] x = [1, x1, x2, ..., xn] w · x = wT x = n X i=0 wi ∗ xi We can thus rewrite the perceptron rule as y = 1 if wT x ≥ 0 = 0 if wT x < 0 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 192.
    36/1 Consider two vectorsw and x w = [w0, w1, w2, ..., wn] x = [1, x1, x2, ..., xn] w · x = wT x = n X i=0 wi ∗ xi We can thus rewrite the perceptron rule as y = 1 if wT x ≥ 0 = 0 if wT x < 0 We are interested in finding the line wTx = 0 which divides the input space into two halves Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 193.
    36/1 Consider two vectorsw and x w = [w0, w1, w2, ..., wn] x = [1, x1, x2, ..., xn] w · x = wT x = n X i=0 wi ∗ xi We can thus rewrite the perceptron rule as y = 1 if wT x ≥ 0 = 0 if wT x < 0 We are interested in finding the line wTx = 0 which divides the input space into two halves Every point (x) on this line satisfies the equation wTx = 0 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 194.
    36/1 Consider two vectorsw and x w = [w0, w1, w2, ..., wn] x = [1, x1, x2, ..., xn] w · x = wT x = n X i=0 wi ∗ xi We can thus rewrite the perceptron rule as y = 1 if wT x ≥ 0 = 0 if wT x < 0 We are interested in finding the line wTx = 0 which divides the input space into two halves Every point (x) on this line satisfies the equation wTx = 0 What can you tell about the angle (α) between w and any point (x) which lies on this line ? Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 195.
    36/1 Consider two vectorsw and x w = [w0, w1, w2, ..., wn] x = [1, x1, x2, ..., xn] w · x = wT x = n X i=0 wi ∗ xi We can thus rewrite the perceptron rule as y = 1 if wT x ≥ 0 = 0 if wT x < 0 We are interested in finding the line wTx = 0 which divides the input space into two halves Every point (x) on this line satisfies the equation wTx = 0 What can you tell about the angle (α) between w and any point (x) which lies on this line ? The angle is 90◦ (∵ cosα = wT x ||w||||x|| = 0) Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 196.
    36/1 Consider two vectorsw and x w = [w0, w1, w2, ..., wn] x = [1, x1, x2, ..., xn] w · x = wT x = n X i=0 wi ∗ xi We can thus rewrite the perceptron rule as y = 1 if wT x ≥ 0 = 0 if wT x < 0 We are interested in finding the line wTx = 0 which divides the input space into two halves Every point (x) on this line satisfies the equation wTx = 0 What can you tell about the angle (α) between w and any point (x) which lies on this line ? The angle is 90◦ (∵ cosα = wT x ||w||||x|| = 0) Since the vector w is perpendicular to every point on the line it is actually perpendicular to the line itself Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 197.
    37/1 x1 x2 w wTx = 0 MiteshM. Khapra CS7015 (Deep Learning) : Lecture 2
  • 198.
    37/1 Consider some points(vectors) which lie in the positive half space of this line (i.e., wTx ≥ 0) x1 x2 p1 p2 p3 w wTx = 0 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 199.
    37/1 Consider some points(vectors) which lie in the positive half space of this line (i.e., wTx ≥ 0) What will be the angle between any such vec- tor and w ? x1 x2 p1 p2 p3 w wTx = 0 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 200.
    37/1 Consider some points(vectors) which lie in the positive half space of this line (i.e., wTx ≥ 0) What will be the angle between any such vec- tor and w ? Obviously, less than 90◦ x1 x2 p1 p2 p3 w wTx = 0 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 201.
    37/1 Consider some points(vectors) which lie in the positive half space of this line (i.e., wTx ≥ 0) What will be the angle between any such vec- tor and w ? Obviously, less than 90◦ What about points (vectors) which lie in the negative half space of this line (i.e., wTx < 0) x1 x2 p1 p2 p3 n1 n2 n3 w wTx = 0 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 202.
    37/1 Consider some points(vectors) which lie in the positive half space of this line (i.e., wTx ≥ 0) What will be the angle between any such vec- tor and w ? Obviously, less than 90◦ What about points (vectors) which lie in the negative half space of this line (i.e., wTx < 0) What will be the angle between any such vec- tor and w ? x1 x2 p1 p2 p3 n1 n2 n3 w wTx = 0 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 203.
    37/1 Consider some points(vectors) which lie in the positive half space of this line (i.e., wTx ≥ 0) What will be the angle between any such vec- tor and w ? Obviously, less than 90◦ What about points (vectors) which lie in the negative half space of this line (i.e., wTx < 0) What will be the angle between any such vec- tor and w ? Obviously, greater than 90◦ x1 x2 p1 p2 p3 n1 n2 n3 w wTx = 0 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 204.
    37/1 Consider some points(vectors) which lie in the positive half space of this line (i.e., wTx ≥ 0) What will be the angle between any such vec- tor and w ? Obviously, less than 90◦ What about points (vectors) which lie in the negative half space of this line (i.e., wTx < 0) What will be the angle between any such vec- tor and w ? Obviously, greater than 90◦ Of course, this also follows from the formula (cosα = wT x ||w||||x|| ) x1 x2 p1 p2 p3 n1 n2 n3 w wTx = 0 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 205.
    37/1 Consider some points(vectors) which lie in the positive half space of this line (i.e., wTx ≥ 0) What will be the angle between any such vec- tor and w ? Obviously, less than 90◦ What about points (vectors) which lie in the negative half space of this line (i.e., wTx < 0) What will be the angle between any such vec- tor and w ? Obviously, greater than 90◦ Of course, this also follows from the formula (cosα = wT x ||w||||x|| ) Keeping this picture in mind let us revisit the algorithm x1 x2 p1 p2 p3 n1 n2 n3 w wTx = 0 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 206.
    38/1 Algorithm: Perceptron LearningAlgorithm P ← inputs with label 1; N ← inputs with label 0; Initialize w randomly; while !convergence do Pick random x ∈ P ∪ N ; if x ∈ P and w.x < 0 then w = w + x ; end if x ∈ N and w.x ≥ 0 then w = w − x ; end end //the algorithm converges when all the inputs are classified correctly cosα = wT x ||w||||x|| Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 207.
    38/1 Algorithm: Perceptron LearningAlgorithm P ← inputs with label 1; N ← inputs with label 0; Initialize w randomly; while !convergence do Pick random x ∈ P ∪ N ; if x ∈ P and w.x < 0 then w = w + x ; end if x ∈ N and w.x ≥ 0 then w = w − x ; end end //the algorithm converges when all the inputs are classified correctly cosα = wT x ||w||||x|| For x ∈ P if w.x < 0 then it means that the angle (α) between this x and the current w is greater than 90◦ Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 208.
    38/1 Algorithm: Perceptron LearningAlgorithm P ← inputs with label 1; N ← inputs with label 0; Initialize w randomly; while !convergence do Pick random x ∈ P ∪ N ; if x ∈ P and w.x < 0 then w = w + x ; end if x ∈ N and w.x ≥ 0 then w = w − x ; end end //the algorithm converges when all the inputs are classified correctly cosα = wT x ||w||||x|| For x ∈ P if w.x < 0 then it means that the angle (α) between this x and the current w is greater than 90◦ (but we want α to be less than 90◦) Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 209.
    38/1 Algorithm: Perceptron LearningAlgorithm P ← inputs with label 1; N ← inputs with label 0; Initialize w randomly; while !convergence do Pick random x ∈ P ∪ N ; if x ∈ P and w.x < 0 then w = w + x ; end if x ∈ N and w.x ≥ 0 then w = w − x ; end end //the algorithm converges when all the inputs are classified correctly cosα = wT x ||w||||x|| For x ∈ P if w.x < 0 then it means that the angle (α) between this x and the current w is greater than 90◦ (but we want α to be less than 90◦) What happens to the new angle (αnew) when wnew = w + x Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 210.
    38/1 Algorithm: Perceptron LearningAlgorithm P ← inputs with label 1; N ← inputs with label 0; Initialize w randomly; while !convergence do Pick random x ∈ P ∪ N ; if x ∈ P and w.x < 0 then w = w + x ; end if x ∈ N and w.x ≥ 0 then w = w − x ; end end //the algorithm converges when all the inputs are classified correctly cosα = wT x ||w||||x|| For x ∈ P if w.x < 0 then it means that the angle (α) between this x and the current w is greater than 90◦ (but we want α to be less than 90◦) What happens to the new angle (αnew) when wnew = w + x cos(αnew) ∝ wnew T x Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 211.
    38/1 Algorithm: Perceptron LearningAlgorithm P ← inputs with label 1; N ← inputs with label 0; Initialize w randomly; while !convergence do Pick random x ∈ P ∪ N ; if x ∈ P and w.x < 0 then w = w + x ; end if x ∈ N and w.x ≥ 0 then w = w − x ; end end //the algorithm converges when all the inputs are classified correctly cosα = wT x ||w||||x|| For x ∈ P if w.x < 0 then it means that the angle (α) between this x and the current w is greater than 90◦ (but we want α to be less than 90◦) What happens to the new angle (αnew) when wnew = w + x cos(αnew) ∝ wnew T x ∝ (w + x)T x Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 212.
    38/1 Algorithm: Perceptron LearningAlgorithm P ← inputs with label 1; N ← inputs with label 0; Initialize w randomly; while !convergence do Pick random x ∈ P ∪ N ; if x ∈ P and w.x < 0 then w = w + x ; end if x ∈ N and w.x ≥ 0 then w = w − x ; end end //the algorithm converges when all the inputs are classified correctly cosα = wT x ||w||||x|| For x ∈ P if w.x < 0 then it means that the angle (α) between this x and the current w is greater than 90◦ (but we want α to be less than 90◦) What happens to the new angle (αnew) when wnew = w + x cos(αnew) ∝ wnew T x ∝ (w + x)T x ∝ wT x + xT x Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 213.
    38/1 Algorithm: Perceptron LearningAlgorithm P ← inputs with label 1; N ← inputs with label 0; Initialize w randomly; while !convergence do Pick random x ∈ P ∪ N ; if x ∈ P and w.x < 0 then w = w + x ; end if x ∈ N and w.x ≥ 0 then w = w − x ; end end //the algorithm converges when all the inputs are classified correctly cosα = wT x ||w||||x|| For x ∈ P if w.x < 0 then it means that the angle (α) between this x and the current w is greater than 90◦ (but we want α to be less than 90◦) What happens to the new angle (αnew) when wnew = w + x cos(αnew) ∝ wnew T x ∝ (w + x)T x ∝ wT x + xT x ∝ cosα + xT x Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 214.
    38/1 Algorithm: Perceptron LearningAlgorithm P ← inputs with label 1; N ← inputs with label 0; Initialize w randomly; while !convergence do Pick random x ∈ P ∪ N ; if x ∈ P and w.x < 0 then w = w + x ; end if x ∈ N and w.x ≥ 0 then w = w − x ; end end //the algorithm converges when all the inputs are classified correctly cosα = wT x ||w||||x|| For x ∈ P if w.x < 0 then it means that the angle (α) between this x and the current w is greater than 90◦ (but we want α to be less than 90◦) What happens to the new angle (αnew) when wnew = w + x cos(αnew) ∝ wnew T x ∝ (w + x)T x ∝ wT x + xT x ∝ cosα + xT x cos(αnew) > cosα Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 215.
    38/1 Algorithm: Perceptron LearningAlgorithm P ← inputs with label 1; N ← inputs with label 0; Initialize w randomly; while !convergence do Pick random x ∈ P ∪ N ; if x ∈ P and w.x < 0 then w = w + x ; end if x ∈ N and w.x ≥ 0 then w = w − x ; end end //the algorithm converges when all the inputs are classified correctly cosα = wT x ||w||||x|| For x ∈ P if w.x < 0 then it means that the angle (α) between this x and the current w is greater than 90◦ (but we want α to be less than 90◦) What happens to the new angle (αnew) when wnew = w + x cos(αnew) ∝ wnew T x ∝ (w + x)T x ∝ wT x + xT x ∝ cosα + xT x cos(αnew) > cosα Thus αnew will be less than α and this is exactly what we want Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 216.
    39/1 Algorithm: Perceptron LearningAlgorithm P ← inputs with label 1; N ← inputs with label 0; Initialize w randomly; while !convergence do Pick random x ∈ P ∪ N ; if x ∈ P and w.x < 0 then w = w + x ; end if x ∈ N and w.x ≥ 0 then w = w − x ; end end //the algorithm converges when all the inputs are classified correctly cosα = wT x ||w||||x|| Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 217.
    39/1 Algorithm: Perceptron LearningAlgorithm P ← inputs with label 1; N ← inputs with label 0; Initialize w randomly; while !convergence do Pick random x ∈ P ∪ N ; if x ∈ P and w.x < 0 then w = w + x ; end if x ∈ N and w.x ≥ 0 then w = w − x ; end end //the algorithm converges when all the inputs are classified correctly cosα = wT x ||w||||x|| For x ∈ N if w.x ≥ 0 then it means that the angle (α) between this x and the current w is less than 90◦ Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 218.
    39/1 Algorithm: Perceptron LearningAlgorithm P ← inputs with label 1; N ← inputs with label 0; Initialize w randomly; while !convergence do Pick random x ∈ P ∪ N ; if x ∈ P and w.x < 0 then w = w + x ; end if x ∈ N and w.x ≥ 0 then w = w − x ; end end //the algorithm converges when all the inputs are classified correctly cosα = wT x ||w||||x|| For x ∈ N if w.x ≥ 0 then it means that the angle (α) between this x and the current w is less than 90◦ (but we want α to be greater than 90◦) Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 219.
    39/1 Algorithm: Perceptron LearningAlgorithm P ← inputs with label 1; N ← inputs with label 0; Initialize w randomly; while !convergence do Pick random x ∈ P ∪ N ; if x ∈ P and w.x < 0 then w = w + x ; end if x ∈ N and w.x ≥ 0 then w = w − x ; end end //the algorithm converges when all the inputs are classified correctly cosα = wT x ||w||||x|| For x ∈ N if w.x ≥ 0 then it means that the angle (α) between this x and the current w is less than 90◦ (but we want α to be greater than 90◦) What happens to the new angle (αnew) when wnew = w − x Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 220.
    39/1 Algorithm: Perceptron LearningAlgorithm P ← inputs with label 1; N ← inputs with label 0; Initialize w randomly; while !convergence do Pick random x ∈ P ∪ N ; if x ∈ P and w.x < 0 then w = w + x ; end if x ∈ N and w.x ≥ 0 then w = w − x ; end end //the algorithm converges when all the inputs are classified correctly cosα = wT x ||w||||x|| For x ∈ N if w.x ≥ 0 then it means that the angle (α) between this x and the current w is less than 90◦ (but we want α to be greater than 90◦) What happens to the new angle (αnew) when wnew = w − x cos(αnew) ∝ wnew T x Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 221.
    39/1 Algorithm: Perceptron LearningAlgorithm P ← inputs with label 1; N ← inputs with label 0; Initialize w randomly; while !convergence do Pick random x ∈ P ∪ N ; if x ∈ P and w.x < 0 then w = w + x ; end if x ∈ N and w.x ≥ 0 then w = w − x ; end end //the algorithm converges when all the inputs are classified correctly cosα = wT x ||w||||x|| For x ∈ N if w.x ≥ 0 then it means that the angle (α) between this x and the current w is less than 90◦ (but we want α to be greater than 90◦) What happens to the new angle (αnew) when wnew = w − x cos(αnew) ∝ wnew T x ∝ (w − x)T x Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 222.
    39/1 Algorithm: Perceptron LearningAlgorithm P ← inputs with label 1; N ← inputs with label 0; Initialize w randomly; while !convergence do Pick random x ∈ P ∪ N ; if x ∈ P and w.x < 0 then w = w + x ; end if x ∈ N and w.x ≥ 0 then w = w − x ; end end //the algorithm converges when all the inputs are classified correctly cosα = wT x ||w||||x|| For x ∈ N if w.x ≥ 0 then it means that the angle (α) between this x and the current w is less than 90◦ (but we want α to be greater than 90◦) What happens to the new angle (αnew) when wnew = w − x cos(αnew) ∝ wnew T x ∝ (w − x)T x ∝ wT x − xT x Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 223.
    39/1 Algorithm: Perceptron LearningAlgorithm P ← inputs with label 1; N ← inputs with label 0; Initialize w randomly; while !convergence do Pick random x ∈ P ∪ N ; if x ∈ P and w.x < 0 then w = w + x ; end if x ∈ N and w.x ≥ 0 then w = w − x ; end end //the algorithm converges when all the inputs are classified correctly cosα = wT x ||w||||x|| For x ∈ N if w.x ≥ 0 then it means that the angle (α) between this x and the current w is less than 90◦ (but we want α to be greater than 90◦) What happens to the new angle (αnew) when wnew = w − x cos(αnew) ∝ wnew T x ∝ (w − x)T x ∝ wT x − xT x ∝ cosα − xT x Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 224.
    39/1 Algorithm: Perceptron LearningAlgorithm P ← inputs with label 1; N ← inputs with label 0; Initialize w randomly; while !convergence do Pick random x ∈ P ∪ N ; if x ∈ P and w.x < 0 then w = w + x ; end if x ∈ N and w.x ≥ 0 then w = w − x ; end end //the algorithm converges when all the inputs are classified correctly cosα = wT x ||w||||x|| For x ∈ N if w.x ≥ 0 then it means that the angle (α) between this x and the current w is less than 90◦ (but we want α to be greater than 90◦) What happens to the new angle (αnew) when wnew = w − x cos(αnew) ∝ wnew T x ∝ (w − x)T x ∝ wT x − xT x ∝ cosα − xT x cos(αnew) < cosα Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 225.
    39/1 Algorithm: Perceptron LearningAlgorithm P ← inputs with label 1; N ← inputs with label 0; Initialize w randomly; while !convergence do Pick random x ∈ P ∪ N ; if x ∈ P and w.x < 0 then w = w + x ; end if x ∈ N and w.x ≥ 0 then w = w − x ; end end //the algorithm converges when all the inputs are classified correctly cosα = wT x ||w||||x|| For x ∈ N if w.x ≥ 0 then it means that the angle (α) between this x and the current w is less than 90◦ (but we want α to be greater than 90◦) What happens to the new angle (αnew) when wnew = w − x cos(αnew) ∝ wnew T x ∝ (w − x)T x ∝ wT x − xT x ∝ cosα − xT x cos(αnew) < cosα Thus αnew will be greater than α and this is exactly what we want Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 226.
    40/1 We will nowsee this algorithm in action for a toy dataset Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 227.
    41/1 x1 x2 p1 p2 p3 n1 n2 n3 We initializedw to a random value Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 228.
    41/1 x1 x2 p1 p2 p3 n1 n2 n3 We initializedw to a random value We observe that currently, w · x < 0 (∵ angle > 90◦) for all the positive points and w·x ≥ 0 (∵ angle < 90◦) for all the negative points (the situation is exactly oppsite of what we actually want it to be) Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 229.
    41/1 x1 x2 p1 p2 p3 n1 n2 n3 We initializedw to a random value We observe that currently, w · x < 0 (∵ angle > 90◦) for all the positive points and w·x ≥ 0 (∵ angle < 90◦) for all the negative points (the situation is exactly oppsite of what we actually want it to be) We now run the algorithm by randomly going over the points Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 230.
    41/1 x1 x2 p1 p2 p3 n1 n2 n3 We initializedw to a random value We observe that currently, w · x < 0 (∵ angle > 90◦) for all the positive points and w·x ≥ 0 (∵ angle < 90◦) for all the negative points (the situation is exactly oppsite of what we actually want it to be) We now run the algorithm by randomly going over the points Randomly pick a point (say, p1), apply correc- tion w = w + x ∵ w · x < 0 (you can check the angle visually) Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 231.
    41/1 x1 x2 p1 p2 p3 n1 n2 n3 We initializedw to a random value We observe that currently, w · x < 0 (∵ angle > 90◦) for all the positive points and w·x ≥ 0 (∵ angle < 90◦) for all the negative points (the situation is exactly oppsite of what we actually want it to be) We now run the algorithm by randomly going over the points Randomly pick a point (say, p1), apply correc- tion w = w + x ∵ w · x < 0 (you can check the angle visually) Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 232.
    41/1 x1 x2 p1 p2 p3 n1 n2 n3 We initializedw to a random value We observe that currently, w · x < 0 (∵ angle > 90◦) for all the positive points and w·x ≥ 0 (∵ angle < 90◦) for all the negative points (the situation is exactly oppsite of what we actually want it to be) We now run the algorithm by randomly going over the points Randomly pick a point (say, p2), apply correc- tion w = w + x ∵ w · x < 0 (you can check the angle visually) Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 233.
    41/1 x1 x2 p1 p2 p3 n1 n2 n3 We initializedw to a random value We observe that currently, w · x < 0 (∵ angle > 90◦) for all the positive points and w·x ≥ 0 (∵ angle < 90◦) for all the negative points (the situation is exactly oppsite of what we actually want it to be) We now run the algorithm by randomly going over the points Randomly pick a point (say, p2), apply correc- tion w = w + x ∵ w · x < 0 (you can check the angle visually) Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 234.
    41/1 x1 x2 p1 p2 p3 n1 n2 n3 We initializedw to a random value We observe that currently, w · x < 0 (∵ angle > 90◦) for all the positive points and w·x ≥ 0 (∵ angle < 90◦) for all the negative points (the situation is exactly oppsite of what we actually want it to be) We now run the algorithm by randomly going over the points Randomly pick a point (say, n1), apply correc- tion w = w − x ∵ w · x ≥ 0 (you can check the angle visually) Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 235.
    41/1 x1 x2 p1 p2 p3 n1 n2 n3 We initializedw to a random value We observe that currently, w · x < 0 (∵ angle > 90◦) for all the positive points and w·x ≥ 0 (∵ angle < 90◦) for all the negative points (the situation is exactly oppsite of what we actually want it to be) We now run the algorithm by randomly going over the points Randomly pick a point (say, n1), apply correc- tion w = w − x ∵ w · x ≥ 0 (you can check the angle visually) Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 236.
    41/1 x1 x2 p1 p2 p3 n1 n2 n3 We initializedw to a random value We observe that currently, w · x < 0 (∵ angle > 90◦) for all the positive points and w·x ≥ 0 (∵ angle < 90◦) for all the negative points (the situation is exactly oppsite of what we actually want it to be) We now run the algorithm by randomly going over the points Randomly pick a point (say, n3), no correction needed ∵ w · x < 0 (you can check the angle visually) Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 237.
    41/1 x1 x2 p1 p2 p3 n1 n2 n3 We initializedw to a random value We observe that currently, w · x < 0 (∵ angle > 90◦) for all the positive points and w·x ≥ 0 (∵ angle < 90◦) for all the negative points (the situation is exactly oppsite of what we actually want it to be) We now run the algorithm by randomly going over the points Randomly pick a point (say, n3), no correction needed ∵ w · x < 0 (you can check the angle visually) Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 238.
    41/1 x1 x2 p1 p2 p3 n1 n2 n3 We initializedw to a random value We observe that currently, w · x < 0 (∵ angle > 90◦) for all the positive points and w·x ≥ 0 (∵ angle < 90◦) for all the negative points (the situation is exactly oppsite of what we actually want it to be) We now run the algorithm by randomly going over the points Randomly pick a point (say, n2), no correction needed ∵ w · x < 0 (you can check the angle visually) Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 239.
    41/1 x1 x2 p1 p2 p3 n1 n2 n3 We initializedw to a random value We observe that currently, w · x < 0 (∵ angle > 90◦) for all the positive points and w·x ≥ 0 (∵ angle < 90◦) for all the negative points (the situation is exactly oppsite of what we actually want it to be) We now run the algorithm by randomly going over the points Randomly pick a point (say, n2), no correction needed ∵ w · x < 0 (you can check the angle visually) Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 240.
    41/1 x1 x2 p1 p2 p3 n1 n2 n3 We initializedw to a random value We observe that currently, w · x < 0 (∵ angle > 90◦) for all the positive points and w·x ≥ 0 (∵ angle < 90◦) for all the negative points (the situation is exactly oppsite of what we actually want it to be) We now run the algorithm by randomly going over the points Randomly pick a point (say, p3), apply correc- tion w = w + x ∵ w · x < 0 (you can check the angle visually) Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 241.
    41/1 x1 x2 p1 p2 p3 n1 n2 n3 We initializedw to a random value We observe that currently, w · x < 0 (∵ angle > 90◦) for all the positive points and w·x ≥ 0 (∵ angle < 90◦) for all the negative points (the situation is exactly oppsite of what we actually want it to be) We now run the algorithm by randomly going over the points Randomly pick a point (say, p3), apply correc- tion w = w + x ∵ w · x < 0 (you can check the angle visually) Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 242.
    41/1 x1 x2 p1 p2 p3 n1 n2 n3 We initializedw to a random value We observe that currently, w · x < 0 (∵ angle > 90◦) for all the positive points and w·x ≥ 0 (∵ angle < 90◦) for all the negative points (the situation is exactly oppsite of what we actually want it to be) We now run the algorithm by randomly going over the points Randomly pick a point (say, p1), no correction needed ∵ w · x ≥ 0 (you can check the angle visually) Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 243.
    41/1 x1 x2 p1 p2 p3 n1 n2 n3 We initializedw to a random value We observe that currently, w · x < 0 (∵ angle > 90◦) for all the positive points and w·x ≥ 0 (∵ angle < 90◦) for all the negative points (the situation is exactly oppsite of what we actually want it to be) We now run the algorithm by randomly going over the points Randomly pick a point (say, p1), no correction needed ∵ w · x ≥ 0 (you can check the angle visually) Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 244.
    41/1 x1 x2 p1 p2 p3 n1 n2 n3 We initializedw to a random value We observe that currently, w · x < 0 (∵ angle > 90◦) for all the positive points and w·x ≥ 0 (∵ angle < 90◦) for all the negative points (the situation is exactly oppsite of what we actually want it to be) We now run the algorithm by randomly going over the points Randomly pick a point (say, p2), no correction needed ∵ w · x ≥ 0 (you can check the angle visually) Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 245.
    41/1 x1 x2 p1 p2 p3 n1 n2 n3 We initializedw to a random value We observe that currently, w · x < 0 (∵ angle > 90◦) for all the positive points and w·x ≥ 0 (∵ angle < 90◦) for all the negative points (the situation is exactly oppsite of what we actually want it to be) We now run the algorithm by randomly going over the points Randomly pick a point (say, p2), no correction needed ∵ w · x ≥ 0 (you can check the angle visually) Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 246.
    41/1 x1 x2 p1 p2 p3 n1 n2 n3 We initializedw to a random value We observe that currently, w · x < 0 (∵ angle > 90◦) for all the positive points and w·x ≥ 0 (∵ angle < 90◦) for all the negative points (the situation is exactly oppsite of what we actually want it to be) We now run the algorithm by randomly going over the points Randomly pick a point (say, n1), no correction needed ∵ w · x < 0 (you can check the angle visually) Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 247.
    41/1 x1 x2 p1 p2 p3 n1 n2 n3 We initializedw to a random value We observe that currently, w · x < 0 (∵ angle > 90◦) for all the positive points and w·x ≥ 0 (∵ angle < 90◦) for all the negative points (the situation is exactly oppsite of what we actually want it to be) We now run the algorithm by randomly going over the points Randomly pick a point (say, n1), no correction needed ∵ w · x < 0 (you can check the angle visually) Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 248.
    41/1 x1 x2 p1 p2 p3 n1 n2 n3 We initializedw to a random value We observe that currently, w · x < 0 (∵ angle > 90◦) for all the positive points and w·x ≥ 0 (∵ angle < 90◦) for all the negative points (the situation is exactly oppsite of what we actually want it to be) We now run the algorithm by randomly going over the points Randomly pick a point (say, n3), no correction needed ∵ w · x < 0 (you can check the angle visually) Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 249.
    41/1 x1 x2 p1 p2 p3 n1 n2 n3 We initializedw to a random value We observe that currently, w · x < 0 (∵ angle > 90◦) for all the positive points and w·x ≥ 0 (∵ angle < 90◦) for all the negative points (the situation is exactly oppsite of what we actually want it to be) We now run the algorithm by randomly going over the points Randomly pick a point (say, n3), no correction needed ∵ w · x < 0 (you can check the angle visually) Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 250.
    41/1 x1 x2 p1 p2 p3 n1 n2 n3 We initializedw to a random value We observe that currently, w · x < 0 (∵ angle > 90◦) for all the positive points and w·x ≥ 0 (∵ angle < 90◦) for all the negative points (the situation is exactly oppsite of what we actually want it to be) We now run the algorithm by randomly going over the points Randomly pick a point (say, n2), no correction needed ∵ w · x < 0 (you can check the angle visually) Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 251.
    41/1 x1 x2 p1 p2 p3 n1 n2 n3 We initializedw to a random value We observe that currently, w · x < 0 (∵ angle > 90◦) for all the positive points and w·x ≥ 0 (∵ angle < 90◦) for all the negative points (the situation is exactly oppsite of what we actually want it to be) We now run the algorithm by randomly going over the points Randomly pick a point (say, n2), no correction needed ∵ w · x < 0 (you can check the angle visually) Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 252.
    41/1 x1 x2 p1 p2 p3 n1 n2 n3 We initializedw to a random value We observe that currently, w · x < 0 (∵ angle > 90◦) for all the positive points and w·x ≥ 0 (∵ angle < 90◦) for all the negative points (the situation is exactly oppsite of what we actually want it to be) We now run the algorithm by randomly going over the points Randomly pick a point (say, p3), no correction needed ∵ w · x ≥ 0 (you can check the angle visually) Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 253.
    41/1 x1 x2 p1 p2 p3 n1 n2 n3 We initializedw to a random value We observe that currently, w · x < 0 (∵ angle > 90◦) for all the positive points and w·x ≥ 0 (∵ angle < 90◦) for all the negative points (the situation is exactly oppsite of what we actually want it to be) We now run the algorithm by randomly going over the points The algorithm has converged Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 254.
    42/1 Module 2.6: Proofof Convergence Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 255.
    43/1 Now that wehave some faith and intuition about why the algorithm works, we will see a more formal proof of convergence ... Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 256.
    44/1 Theorem Definition: Two setsP and N of points in an n-dimensional space are called absolutely linearly separable if Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 257.
    44/1 Theorem Definition: Two setsP and N of points in an n-dimensional space are called absolutely linearly separable if n + 1 real numbers w0, w1, ..., wn exist Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 258.
    44/1 Theorem Definition: Two setsP and N of points in an n-dimensional space are called absolutely linearly separable if n + 1 real numbers w0, w1, ..., wn exist such that every point (x1, x2, ..., xn) ∈ P satisfies Pn i=1 wi ∗ xi > w0 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 259.
    44/1 Theorem Definition: Two setsP and N of points in an n-dimensional space are called absolutely linearly separable if n + 1 real numbers w0, w1, ..., wn exist such that every point (x1, x2, ..., xn) ∈ P satisfies Pn i=1 wi ∗ xi > w0 and every point (x1, x2, ..., xn) ∈ N satisfies Pn i=1 wi ∗ xi < w0. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 260.
    44/1 Theorem Definition: Two setsP and N of points in an n-dimensional space are called absolutely linearly separable if n + 1 real numbers w0, w1, ..., wn exist such that every point (x1, x2, ..., xn) ∈ P satisfies Pn i=1 wi ∗ xi > w0 and every point (x1, x2, ..., xn) ∈ N satisfies Pn i=1 wi ∗ xi < w0. Proposition: Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 261.
    44/1 Theorem Definition: Two setsP and N of points in an n-dimensional space are called absolutely linearly separable if n + 1 real numbers w0, w1, ..., wn exist such that every point (x1, x2, ..., xn) ∈ P satisfies Pn i=1 wi ∗ xi > w0 and every point (x1, x2, ..., xn) ∈ N satisfies Pn i=1 wi ∗ xi < w0. Proposition: If the sets P and N are finite and linearly separable, Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 262.
    44/1 Theorem Definition: Two setsP and N of points in an n-dimensional space are called absolutely linearly separable if n + 1 real numbers w0, w1, ..., wn exist such that every point (x1, x2, ..., xn) ∈ P satisfies Pn i=1 wi ∗ xi > w0 and every point (x1, x2, ..., xn) ∈ N satisfies Pn i=1 wi ∗ xi < w0. Proposition: If the sets P and N are finite and linearly separable, the perceptron learning algorithm updates the weight vector wt a finite number of times. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 263.
    44/1 Theorem Definition: Two setsP and N of points in an n-dimensional space are called absolutely linearly separable if n + 1 real numbers w0, w1, ..., wn exist such that every point (x1, x2, ..., xn) ∈ P satisfies Pn i=1 wi ∗ xi > w0 and every point (x1, x2, ..., xn) ∈ N satisfies Pn i=1 wi ∗ xi < w0. Proposition: If the sets P and N are finite and linearly separable, the perceptron learning algorithm updates the weight vector wt a finite number of times. In other words: if the vectors in P and N are tested cyclically one after the other, Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 264.
    44/1 Theorem Definition: Two setsP and N of points in an n-dimensional space are called absolutely linearly separable if n + 1 real numbers w0, w1, ..., wn exist such that every point (x1, x2, ..., xn) ∈ P satisfies Pn i=1 wi ∗ xi > w0 and every point (x1, x2, ..., xn) ∈ N satisfies Pn i=1 wi ∗ xi < w0. Proposition: If the sets P and N are finite and linearly separable, the perceptron learning algorithm updates the weight vector wt a finite number of times. In other words: if the vectors in P and N are tested cyclically one after the other, a weight vector wt is found after a finite number of steps t which can separate the two sets. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 265.
    44/1 Theorem Definition: Two setsP and N of points in an n-dimensional space are called absolutely linearly separable if n + 1 real numbers w0, w1, ..., wn exist such that every point (x1, x2, ..., xn) ∈ P satisfies Pn i=1 wi ∗ xi > w0 and every point (x1, x2, ..., xn) ∈ N satisfies Pn i=1 wi ∗ xi < w0. Proposition: If the sets P and N are finite and linearly separable, the perceptron learning algorithm updates the weight vector wt a finite number of times. In other words: if the vectors in P and N are tested cyclically one after the other, a weight vector wt is found after a finite number of steps t which can separate the two sets. Proof: On the next slide Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 266.
    45/1 Setup: If x ∈N then -x ∈ P (∵ wT x < 0 =⇒ wT (−x) ≥ 0) Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 267.
    45/1 Setup: If x ∈N then -x ∈ P (∵ wT x < 0 =⇒ wT (−x) ≥ 0) We can thus consider a single set P0 = P ∪ N− and for every element p ∈ P0 ensure that wT p ≥ 0 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 268.
    45/1 Setup: If x ∈N then -x ∈ P (∵ wT x < 0 =⇒ wT (−x) ≥ 0) We can thus consider a single set P0 = P ∪ N− and for every element p ∈ P0 ensure that wT p ≥ 0 Algorithm: Perceptron Learning Algorithm P ← inputs with label 1; Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 269.
    45/1 Setup: If x ∈N then -x ∈ P (∵ wT x < 0 =⇒ wT (−x) ≥ 0) We can thus consider a single set P0 = P ∪ N− and for every element p ∈ P0 ensure that wT p ≥ 0 Algorithm: Perceptron Learning Algorithm P ← inputs with label 1; N ← inputs with label 0; Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 270.
    45/1 Setup: If x ∈N then -x ∈ P (∵ wT x < 0 =⇒ wT (−x) ≥ 0) We can thus consider a single set P0 = P ∪ N− and for every element p ∈ P0 ensure that wT p ≥ 0 Algorithm: Perceptron Learning Algorithm P ← inputs with label 1; N ← inputs with label 0; N− contains negations of all points in N; Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 271.
    45/1 Setup: If x ∈N then -x ∈ P (∵ wT x < 0 =⇒ wT (−x) ≥ 0) We can thus consider a single set P0 = P ∪ N− and for every element p ∈ P0 ensure that wT p ≥ 0 Algorithm: Perceptron Learning Algorithm P ← inputs with label 1; N ← inputs with label 0; N− contains negations of all points in N; P0 ← P ∪ N− ; Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 272.
    45/1 Setup: If x ∈N then -x ∈ P (∵ wT x < 0 =⇒ wT (−x) ≥ 0) We can thus consider a single set P0 = P ∪ N− and for every element p ∈ P0 ensure that wT p ≥ 0 Algorithm: Perceptron Learning Algorithm P ← inputs with label 1; N ← inputs with label 0; N− contains negations of all points in N; P0 ← P ∪ N− ; Initialize w randomly; Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 273.
    45/1 Setup: If x ∈N then -x ∈ P (∵ wT x < 0 =⇒ wT (−x) ≥ 0) We can thus consider a single set P0 = P ∪ N− and for every element p ∈ P0 ensure that wT p ≥ 0 Algorithm: Perceptron Learning Algorithm P ← inputs with label 1; N ← inputs with label 0; N− contains negations of all points in N; P0 ← P ∪ N− ; Initialize w randomly; while !convergence do end //the algorithm converges when all the inputs are classified correctly Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 274.
    45/1 Setup: If x ∈N then -x ∈ P (∵ wT x < 0 =⇒ wT (−x) ≥ 0) We can thus consider a single set P0 = P ∪ N− and for every element p ∈ P0 ensure that wT p ≥ 0 Algorithm: Perceptron Learning Algorithm P ← inputs with label 1; N ← inputs with label 0; N− contains negations of all points in N; P0 ← P ∪ N− ; Initialize w randomly; while !convergence do Pick random p ∈ P0 ; end //the algorithm converges when all the inputs are classified correctly Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 275.
    45/1 Setup: If x ∈N then -x ∈ P (∵ wT x < 0 =⇒ wT (−x) ≥ 0) We can thus consider a single set P0 = P ∪ N− and for every element p ∈ P0 ensure that wT p ≥ 0 Algorithm: Perceptron Learning Algorithm P ← inputs with label 1; N ← inputs with label 0; N− contains negations of all points in N; P0 ← P ∪ N− ; Initialize w randomly; while !convergence do Pick random p ∈ P0 ; if w.p < 0 then end end //the algorithm converges when all the inputs are classified correctly Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 276.
    45/1 Setup: If x ∈N then -x ∈ P (∵ wT x < 0 =⇒ wT (−x) ≥ 0) We can thus consider a single set P0 = P ∪ N− and for every element p ∈ P0 ensure that wT p ≥ 0 Algorithm: Perceptron Learning Algorithm P ← inputs with label 1; N ← inputs with label 0; N− contains negations of all points in N; P0 ← P ∪ N− ; Initialize w randomly; while !convergence do Pick random p ∈ P0 ; if w.p < 0 then w = w + p ; end end //the algorithm converges when all the inputs are classified correctly //notice that we do not need the other if condition because by construction we want all points in P0 to lie in the positive half space w.p ≥ 0 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 277.
    45/1 Setup: If x ∈N then -x ∈ P (∵ wT x < 0 =⇒ wT (−x) ≥ 0) We can thus consider a single set P0 = P ∪ N− and for every element p ∈ P0 ensure that wT p ≥ 0 Further we will normalize all the p’s so that ||p|| = 1 (no- tice that this does not affect the solution ∵ if wT p ||p|| ≥ 0 then wT p ≥ 0) Algorithm: Perceptron Learning Algorithm P ← inputs with label 1; N ← inputs with label 0; N− contains negations of all points in N; P0 ← P ∪ N− ; Initialize w randomly; while !convergence do Pick random p ∈ P0 ; if w.p < 0 then w = w + p ; end end //the algorithm converges when all the inputs are classified correctly //notice that we do not need the other if condition because by construction we want all points in P0 to lie in the positive half space w.p ≥ 0 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 278.
    45/1 Setup: If x ∈N then -x ∈ P (∵ wT x < 0 =⇒ wT (−x) ≥ 0) We can thus consider a single set P0 = P ∪ N− and for every element p ∈ P0 ensure that wT p ≥ 0 Further we will normalize all the p’s so that ||p|| = 1 (no- tice that this does not affect the solution ∵ if wT p ||p|| ≥ 0 then wT p ≥ 0) Algorithm: Perceptron Learning Algorithm P ← inputs with label 1; N ← inputs with label 0; N− contains negations of all points in N; P0 ← P ∪ N− ; Initialize w randomly; while !convergence do Pick random p ∈ P0 ; p ← p ||p|| (so now,||p|| = 1) ; if w.p < 0 then w = w + p ; end end //the algorithm converges when all the inputs are classified correctly //notice that we do not need the other if condition because by construction we want all points in P0 to lie in the positive half space w.p ≥ 0 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 279.
    45/1 Setup: If x ∈N then -x ∈ P (∵ wT x < 0 =⇒ wT (−x) ≥ 0) We can thus consider a single set P0 = P ∪ N− and for every element p ∈ P0 ensure that wT p ≥ 0 Further we will normalize all the p’s so that ||p|| = 1 (no- tice that this does not affect the solution ∵ if wT p ||p|| ≥ 0 then wT p ≥ 0) Let w∗ be the normalized solution vector (we know one exists as the data is linearly separable) Algorithm: Perceptron Learning Algorithm P ← inputs with label 1; N ← inputs with label 0; N− contains negations of all points in N; P0 ← P ∪ N− ; Initialize w randomly; while !convergence do Pick random p ∈ P0 ; p ← p ||p|| (so now,||p|| = 1) ; if w.p < 0 then w = w + p ; end end //the algorithm converges when all the inputs are classified correctly //notice that we do not need the other if condition because by construction we want all points in P0 to lie in the positive half space w.p ≥ 0 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 280.
    46/1 Observations: w∗ is someoptimal solution which exists but we don’t know what it is Proof: Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 281.
    46/1 Observations: w∗ is someoptimal solution which exists but we don’t know what it is Proof: Now suppose at time step t we inspected the point pi and found that wT · pi ≤ 0 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 282.
    46/1 Observations: w∗ is someoptimal solution which exists but we don’t know what it is Proof: Now suppose at time step t we inspected the point pi and found that wT · pi ≤ 0 We make a correction wt+1 = wt + pi Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 283.
    46/1 Observations: w∗ is someoptimal solution which exists but we don’t know what it is Proof: Now suppose at time step t we inspected the point pi and found that wT · pi ≤ 0 We make a correction wt+1 = wt + pi Let β be the angle between w∗ and wt+1 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 284.
    46/1 Observations: w∗ is someoptimal solution which exists but we don’t know what it is Proof: Now suppose at time step t we inspected the point pi and found that wT · pi ≤ 0 We make a correction wt+1 = wt + pi Let β be the angle between w∗ and wt+1 cosβ = w∗ · wt+1 ||wt+1|| Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 285.
    46/1 Observations: w∗ is someoptimal solution which exists but we don’t know what it is Proof: Now suppose at time step t we inspected the point pi and found that wT · pi ≤ 0 We make a correction wt+1 = wt + pi Let β be the angle between w∗ and wt+1 cosβ = w∗ · wt+1 ||wt+1|| Numerator = w∗ · wt+1 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 286.
    46/1 Observations: w∗ is someoptimal solution which exists but we don’t know what it is Proof: Now suppose at time step t we inspected the point pi and found that wT · pi ≤ 0 We make a correction wt+1 = wt + pi Let β be the angle between w∗ and wt+1 cosβ = w∗ · wt+1 ||wt+1|| Numerator = w∗ · wt+1 = w∗ · (wt + pi) Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 287.
    46/1 Observations: w∗ is someoptimal solution which exists but we don’t know what it is Proof: Now suppose at time step t we inspected the point pi and found that wT · pi ≤ 0 We make a correction wt+1 = wt + pi Let β be the angle between w∗ and wt+1 cosβ = w∗ · wt+1 ||wt+1|| Numerator = w∗ · wt+1 = w∗ · (wt + pi) = w∗ · wt + w∗ · pi Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 288.
    46/1 Observations: w∗ is someoptimal solution which exists but we don’t know what it is Proof: Now suppose at time step t we inspected the point pi and found that wT · pi ≤ 0 We make a correction wt+1 = wt + pi Let β be the angle between w∗ and wt+1 cosβ = w∗ · wt+1 ||wt+1|| Numerator = w∗ · wt+1 = w∗ · (wt + pi) = w∗ · wt + w∗ · pi ≥ w∗ · wt + δ (δ = min{w∗ · pi|∀i} Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 289.
    46/1 Observations: w∗ is someoptimal solution which exists but we don’t know what it is Proof: Now suppose at time step t we inspected the point pi and found that wT · pi ≤ 0 We make a correction wt+1 = wt + pi Let β be the angle between w∗ and wt+1 cosβ = w∗ · wt+1 ||wt+1|| Numerator = w∗ · wt+1 = w∗ · (wt + pi) = w∗ · wt + w∗ · pi ≥ w∗ · wt + δ (δ = min{w∗ · pi|∀i} ≥ w∗ · (wt−1 + pj) + δ Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 290.
    46/1 Observations: w∗ is someoptimal solution which exists but we don’t know what it is Proof: Now suppose at time step t we inspected the point pi and found that wT · pi ≤ 0 We make a correction wt+1 = wt + pi Let β be the angle between w∗ and wt+1 cosβ = w∗ · wt+1 ||wt+1|| Numerator = w∗ · wt+1 = w∗ · (wt + pi) = w∗ · wt + w∗ · pi ≥ w∗ · wt + δ (δ = min{w∗ · pi|∀i} ≥ w∗ · (wt−1 + pj) + δ ≥ w∗ · wt−1 + w∗ · pj + δ Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 291.
    46/1 Observations: w∗ is someoptimal solution which exists but we don’t know what it is Proof: Now suppose at time step t we inspected the point pi and found that wT · pi ≤ 0 We make a correction wt+1 = wt + pi Let β be the angle between w∗ and wt+1 cosβ = w∗ · wt+1 ||wt+1|| Numerator = w∗ · wt+1 = w∗ · (wt + pi) = w∗ · wt + w∗ · pi ≥ w∗ · wt + δ (δ = min{w∗ · pi|∀i} ≥ w∗ · (wt−1 + pj) + δ ≥ w∗ · wt−1 + w∗ · pj + δ ≥ w∗ · wt−1 + 2δ Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 292.
    46/1 Observations: w∗ is someoptimal solution which exists but we don’t know what it is We do not make a correction at every time-step Proof: Now suppose at time step t we inspected the point pi and found that wT · pi ≤ 0 We make a correction wt+1 = wt + pi Let β be the angle between w∗ and wt+1 cosβ = w∗ · wt+1 ||wt+1|| Numerator = w∗ · wt+1 = w∗ · (wt + pi) = w∗ · wt + w∗ · pi ≥ w∗ · wt + δ (δ = min{w∗ · pi|∀i} ≥ w∗ · (wt−1 + pj) + δ ≥ w∗ · wt−1 + w∗ · pj + δ ≥ w∗ · wt−1 + 2δ Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 293.
    46/1 Observations: w∗ is someoptimal solution which exists but we don’t know what it is We do not make a correction at every time-step We make a correction only if wT · pi ≤ 0 at that time step Proof: Now suppose at time step t we inspected the point pi and found that wT · pi ≤ 0 We make a correction wt+1 = wt + pi Let β be the angle between w∗ and wt+1 cosβ = w∗ · wt+1 ||wt+1|| Numerator = w∗ · wt+1 = w∗ · (wt + pi) = w∗ · wt + w∗ · pi ≥ w∗ · wt + δ (δ = min{w∗ · pi|∀i} ≥ w∗ · (wt−1 + pj) + δ ≥ w∗ · wt−1 + w∗ · pj + δ ≥ w∗ · wt−1 + 2δ Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 294.
    46/1 Observations: w∗ is someoptimal solution which exists but we don’t know what it is We do not make a correction at every time-step We make a correction only if wT · pi ≤ 0 at that time step So at time-step t we would have made only k (≤ t) cor- rections Proof: Now suppose at time step t we inspected the point pi and found that wT · pi ≤ 0 We make a correction wt+1 = wt + pi Let β be the angle between w∗ and wt+1 cosβ = w∗ · wt+1 ||wt+1|| Numerator = w∗ · wt+1 = w∗ · (wt + pi) = w∗ · wt + w∗ · pi ≥ w∗ · wt + δ (δ = min{w∗ · pi|∀i} ≥ w∗ · (wt−1 + pj) + δ ≥ w∗ · wt−1 + w∗ · pj + δ ≥ w∗ · wt−1 + 2δ Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 295.
    46/1 Observations: w∗ is someoptimal solution which exists but we don’t know what it is We do not make a correction at every time-step We make a correction only if wT · pi ≤ 0 at that time step So at time-step t we would have made only k (≤ t) cor- rections Every time we make a correc- tion a quantity δ gets added to the numerator Proof: Now suppose at time step t we inspected the point pi and found that wT · pi ≤ 0 We make a correction wt+1 = wt + pi Let β be the angle between w∗ and wt+1 cosβ = w∗ · wt+1 ||wt+1|| Numerator = w∗ · wt+1 = w∗ · (wt + pi) = w∗ · wt + w∗ · pi ≥ w∗ · wt + δ (δ = min{w∗ · pi|∀i} ≥ w∗ · (wt−1 + pj) + δ ≥ w∗ · wt−1 + w∗ · pj + δ ≥ w∗ · wt−1 + 2δ Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 296.
    46/1 Observations: w∗ is someoptimal solution which exists but we don’t know what it is We do not make a correction at every time-step We make a correction only if wT · pi ≤ 0 at that time step So at time-step t we would have made only k (≤ t) cor- rections Every time we make a correc- tion a quantity δ gets added to the numerator So by time-step t, a quantity kδ gets added to the numer- ator Proof: Now suppose at time step t we inspected the point pi and found that wT · pi ≤ 0 We make a correction wt+1 = wt + pi Let β be the angle between w∗ and wt+1 cosβ = w∗ · wt+1 ||wt+1|| Numerator = w∗ · wt+1 = w∗ · (wt + pi) = w∗ · wt + w∗ · pi ≥ w∗ · wt + δ (δ = min{w∗ · pi|∀i} ≥ w∗ · (wt−1 + pj) + δ ≥ w∗ · wt−1 + w∗ · pj + δ ≥ w∗ · wt−1 + 2δ Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 297.
    46/1 Observations: w∗ is someoptimal solution which exists but we don’t know what it is We do not make a correction at every time-step We make a correction only if wT · pi ≤ 0 at that time step So at time-step t we would have made only k (≤ t) cor- rections Every time we make a correc- tion a quantity δ gets added to the numerator So by time-step t, a quantity kδ gets added to the numer- ator Proof: Now suppose at time step t we inspected the point pi and found that wT · pi ≤ 0 We make a correction wt+1 = wt + pi Let β be the angle between w∗ and wt+1 cosβ = w∗ · wt+1 ||wt+1|| Numerator = w∗ · wt+1 = w∗ · (wt + pi) = w∗ · wt + w∗ · pi ≥ w∗ · wt + δ (δ = min{w∗ · pi|∀i} ≥ w∗ · (wt−1 + pj) + δ ≥ w∗ · wt−1 + w∗ · pj + δ ≥ w∗ · wt−1 + 2δ ≥ w∗ · w0 + (k)δ (By induction) Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 298.
    47/1 Proof (continued:) So farwe have, wT · pi ≤ 0 (and hence we made the correction) Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 299.
    47/1 Proof (continued:) So farwe have, wT · pi ≤ 0 (and hence we made the correction) cosβ = w∗ · wt+1 ||wt+1|| (by definition) Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 300.
    47/1 Proof (continued:) So farwe have, wT · pi ≤ 0 (and hence we made the correction) cosβ = w∗ · wt+1 ||wt+1|| (by definition) Numerator ≥ w∗ · w0 + kδ (proved by induction) Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 301.
    47/1 Proof (continued:) So farwe have, wT · pi ≤ 0 (and hence we made the correction) cosβ = w∗ · wt+1 ||wt+1|| (by definition) Numerator ≥ w∗ · w0 + kδ (proved by induction) Denominator2 = ||wt+1||2 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 302.
    47/1 Proof (continued:) So farwe have, wT · pi ≤ 0 (and hence we made the correction) cosβ = w∗ · wt+1 ||wt+1|| (by definition) Numerator ≥ w∗ · w0 + kδ (proved by induction) Denominator2 = ||wt+1||2 = (wt + pi) · (wt + pi) Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 303.
    47/1 Proof (continued:) So farwe have, wT · pi ≤ 0 (and hence we made the correction) cosβ = w∗ · wt+1 ||wt+1|| (by definition) Numerator ≥ w∗ · w0 + kδ (proved by induction) Denominator2 = ||wt+1||2 = (wt + pi) · (wt + pi) = ||wt||2 + 2wt · pi + ||pi||2 ) Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 304.
    47/1 Proof (continued:) So farwe have, wT · pi ≤ 0 (and hence we made the correction) cosβ = w∗ · wt+1 ||wt+1|| (by definition) Numerator ≥ w∗ · w0 + kδ (proved by induction) Denominator2 = ||wt+1||2 = (wt + pi) · (wt + pi) = ||wt||2 + 2wt · pi + ||pi||2 ) ≤ ||wt||2 + ||pi||2 (∵ wt · pi ≤ 0) Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 305.
    47/1 Proof (continued:) So farwe have, wT · pi ≤ 0 (and hence we made the correction) cosβ = w∗ · wt+1 ||wt+1|| (by definition) Numerator ≥ w∗ · w0 + kδ (proved by induction) Denominator2 = ||wt+1||2 = (wt + pi) · (wt + pi) = ||wt||2 + 2wt · pi + ||pi||2 ) ≤ ||wt||2 + ||pi||2 (∵ wt · pi ≤ 0) ≤ ||wt||2 + 1 (∵ ||pi||2 = 1) Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 306.
    47/1 Proof (continued:) So farwe have, wT · pi ≤ 0 (and hence we made the correction) cosβ = w∗ · wt+1 ||wt+1|| (by definition) Numerator ≥ w∗ · w0 + kδ (proved by induction) Denominator2 = ||wt+1||2 = (wt + pi) · (wt + pi) = ||wt||2 + 2wt · pi + ||pi||2 ) ≤ ||wt||2 + ||pi||2 (∵ wt · pi ≤ 0) ≤ ||wt||2 + 1 (∵ ||pi||2 = 1) ≤ (||wt−1||2 + 1) + 1 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 307.
    47/1 Proof (continued:) So farwe have, wT · pi ≤ 0 (and hence we made the correction) cosβ = w∗ · wt+1 ||wt+1|| (by definition) Numerator ≥ w∗ · w0 + kδ (proved by induction) Denominator2 = ||wt+1||2 = (wt + pi) · (wt + pi) = ||wt||2 + 2wt · pi + ||pi||2 ) ≤ ||wt||2 + ||pi||2 (∵ wt · pi ≤ 0) ≤ ||wt||2 + 1 (∵ ||pi||2 = 1) ≤ (||wt−1||2 + 1) + 1 ≤ ||wt−1||2 + 2 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 308.
    47/1 Proof (continued:) So farwe have, wT · pi ≤ 0 (and hence we made the correction) cosβ = w∗ · wt+1 ||wt+1|| (by definition) Numerator ≥ w∗ · w0 + kδ (proved by induction) Denominator2 = ||wt+1||2 = (wt + pi) · (wt + pi) = ||wt||2 + 2wt · pi + ||pi||2 ) ≤ ||wt||2 + ||pi||2 (∵ wt · pi ≤ 0) ≤ ||wt||2 + 1 (∵ ||pi||2 = 1) ≤ (||wt−1||2 + 1) + 1 ≤ ||wt−1||2 + 2 ≤ ||w0||2 + (k) (By same observation that we made about δ) Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 309.
    48/1 Proof (continued:) So farwe have, wT · pi ≤ 0 (and hence we made the correction) cosβ = w∗ · wt+1 ||wt+1|| (by definition) Numerator ≥ w∗ · w0 + kδ (proved by induction) Denominator2 ≤ ||w0||2 + k (By same observation that we made about δ) Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 310.
    48/1 Proof (continued:) So farwe have, wT · pi ≤ 0 (and hence we made the correction) cosβ = w∗ · wt+1 ||wt+1|| (by definition) Numerator ≥ w∗ · w0 + kδ (proved by induction) Denominator2 ≤ ||w0||2 + k (By same observation that we made about δ) cosβ ≥ w∗ · w0 + kδ p ||w0||2 + k Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 311.
    48/1 Proof (continued:) So farwe have, wT · pi ≤ 0 (and hence we made the correction) cosβ = w∗ · wt+1 ||wt+1|| (by definition) Numerator ≥ w∗ · w0 + kδ (proved by induction) Denominator2 ≤ ||w0||2 + k (By same observation that we made about δ) cosβ ≥ w∗ · w0 + kδ p ||w0||2 + k cosβ thus grows proportional to √ k Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 312.
    48/1 Proof (continued:) So farwe have, wT · pi ≤ 0 (and hence we made the correction) cosβ = w∗ · wt+1 ||wt+1|| (by definition) Numerator ≥ w∗ · w0 + kδ (proved by induction) Denominator2 ≤ ||w0||2 + k (By same observation that we made about δ) cosβ ≥ w∗ · w0 + kδ p ||w0||2 + k cosβ thus grows proportional to √ k As k (number of corrections) increases cosβ can become arbitrarily large Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 313.
    48/1 Proof (continued:) So farwe have, wT · pi ≤ 0 (and hence we made the correction) cosβ = w∗ · wt+1 ||wt+1|| (by definition) Numerator ≥ w∗ · w0 + kδ (proved by induction) Denominator2 ≤ ||w0||2 + k (By same observation that we made about δ) cosβ ≥ w∗ · w0 + kδ p ||w0||2 + k cosβ thus grows proportional to √ k As k (number of corrections) increases cosβ can become arbitrarily large But since cosβ ≤ 1, k must be bounded by a maximum number Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 314.
    48/1 Proof (continued:) So farwe have, wT · pi ≤ 0 (and hence we made the correction) cosβ = w∗ · wt+1 ||wt+1|| (by definition) Numerator ≥ w∗ · w0 + kδ (proved by induction) Denominator2 ≤ ||w0||2 + k (By same observation that we made about δ) cosβ ≥ w∗ · w0 + kδ p ||w0||2 + k cosβ thus grows proportional to √ k As k (number of corrections) increases cosβ can become arbitrarily large But since cosβ ≤ 1, k must be bounded by a maximum number Thus, there can only be a finite number of corrections (k) to w and the algorithm will converge! Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 315.
    49/1 Coming back toour questions ... What about non-boolean (say, real) inputs? Do we always need to hand code the threshold? Are all inputs equal? What if we want to assign more weight (importance) to some inputs? What about functions which are not linearly separable ? Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 316.
    49/1 Coming back toour questions ... What about non-boolean (say, real) inputs? Real valued inputs are allowed in perceptron Do we always need to hand code the threshold? Are all inputs equal? What if we want to assign more weight (importance) to some inputs? What about functions which are not linearly separable ? Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 317.
    49/1 Coming back toour questions ... What about non-boolean (say, real) inputs? Real valued inputs are allowed in perceptron Do we always need to hand code the threshold? No, we can learn the threshold Are all inputs equal? What if we want to assign more weight (importance) to some inputs? What about functions which are not linearly separable ? Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 318.
    49/1 Coming back toour questions ... What about non-boolean (say, real) inputs? Real valued inputs are allowed in perceptron Do we always need to hand code the threshold? No, we can learn the threshold Are all inputs equal? What if we want to assign more weight (importance) to some inputs? A perceptron allows weights to be assigned to inputs What about functions which are not linearly separable ? Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 319.
    49/1 Coming back toour questions ... What about non-boolean (say, real) inputs? Real valued inputs are allowed in perceptron Do we always need to hand code the threshold? No, we can learn the threshold Are all inputs equal? What if we want to assign more weight (importance) to some inputs? A perceptron allows weights to be assigned to inputs What about functions which are not linearly separable ? Not possible with a single perceptron but we will see how to handle this .. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 320.
    50/1 Module 2.7: LinearlySeparable Boolean Functions Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 321.
    51/1 So what dowe do about functions which are not linearly separable ? Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 322.
    51/1 So what dowe do about functions which are not linearly separable ? Let us see one such simple boolean function first ? Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 323.
    52/1 x1 x2 XOR 00 0 w0 + P2 i=1 wixi < 0 1 0 1 w0 + P2 i=1 wixi ≥ 0 0 1 1 w0 + P2 i=1 wixi ≥ 0 1 1 0 w0 + P2 i=1 wixi < 0 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 324.
    52/1 x1 x2 XOR 00 0 w0 + P2 i=1 wixi < 0 1 0 1 w0 + P2 i=1 wixi ≥ 0 0 1 1 w0 + P2 i=1 wixi ≥ 0 1 1 0 w0 + P2 i=1 wixi < 0 w0 + w1 · 0 + w2 · 0 < 0 =⇒ w0 < 0 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 325.
    52/1 x1 x2 XOR 00 0 w0 + P2 i=1 wixi < 0 1 0 1 w0 + P2 i=1 wixi ≥ 0 0 1 1 w0 + P2 i=1 wixi ≥ 0 1 1 0 w0 + P2 i=1 wixi < 0 w0 + w1 · 0 + w2 · 0 < 0 =⇒ w0 < 0 w0 + w1 · 0 + w2 · 1 ≥ 0 =⇒ w2 ≥ −w0 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 326.
    52/1 x1 x2 XOR 00 0 w0 + P2 i=1 wixi < 0 1 0 1 w0 + P2 i=1 wixi ≥ 0 0 1 1 w0 + P2 i=1 wixi ≥ 0 1 1 0 w0 + P2 i=1 wixi < 0 w0 + w1 · 0 + w2 · 0 < 0 =⇒ w0 < 0 w0 + w1 · 0 + w2 · 1 ≥ 0 =⇒ w2 ≥ −w0 w0 + w1 · 1 + w2 · 0 ≥ 0 =⇒ w1 ≥ −w0 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 327.
    52/1 x1 x2 XOR 00 0 w0 + P2 i=1 wixi < 0 1 0 1 w0 + P2 i=1 wixi ≥ 0 0 1 1 w0 + P2 i=1 wixi ≥ 0 1 1 0 w0 + P2 i=1 wixi < 0 w0 + w1 · 0 + w2 · 0 < 0 =⇒ w0 < 0 w0 + w1 · 0 + w2 · 1 ≥ 0 =⇒ w2 ≥ −w0 w0 + w1 · 1 + w2 · 0 ≥ 0 =⇒ w1 ≥ −w0 w0 + w1 · 1 + w2 · 1 < 0 =⇒ w1 + w2 < −w0 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 328.
    52/1 x1 x2 XOR 00 0 w0 + P2 i=1 wixi < 0 1 0 1 w0 + P2 i=1 wixi ≥ 0 0 1 1 w0 + P2 i=1 wixi ≥ 0 1 1 0 w0 + P2 i=1 wixi < 0 w0 + w1 · 0 + w2 · 0 < 0 =⇒ w0 < 0 w0 + w1 · 0 + w2 · 1 ≥ 0 =⇒ w2 ≥ −w0 w0 + w1 · 1 + w2 · 0 ≥ 0 =⇒ w1 ≥ −w0 w0 + w1 · 1 + w2 · 1 < 0 =⇒ w1 + w2 < −w0 The fourth condition contradicts conditions 2 and 3 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 329.
    52/1 x1 x2 XOR 00 0 w0 + P2 i=1 wixi < 0 1 0 1 w0 + P2 i=1 wixi ≥ 0 0 1 1 w0 + P2 i=1 wixi ≥ 0 1 1 0 w0 + P2 i=1 wixi < 0 w0 + w1 · 0 + w2 · 0 < 0 =⇒ w0 < 0 w0 + w1 · 0 + w2 · 1 ≥ 0 =⇒ w2 ≥ −w0 w0 + w1 · 1 + w2 · 0 ≥ 0 =⇒ w1 ≥ −w0 w0 + w1 · 1 + w2 · 1 < 0 =⇒ w1 + w2 < −w0 The fourth condition contradicts conditions 2 and 3 Hence we cannot have a solution to this set of inequalities x1 x2 (0, 0) (0, 1) (1, 0) (1, 1) Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 330.
    52/1 x1 x2 XOR 00 0 w0 + P2 i=1 wixi < 0 1 0 1 w0 + P2 i=1 wixi ≥ 0 0 1 1 w0 + P2 i=1 wixi ≥ 0 1 1 0 w0 + P2 i=1 wixi < 0 w0 + w1 · 0 + w2 · 0 < 0 =⇒ w0 < 0 w0 + w1 · 0 + w2 · 1 ≥ 0 =⇒ w2 ≥ −w0 w0 + w1 · 1 + w2 · 0 ≥ 0 =⇒ w1 ≥ −w0 w0 + w1 · 1 + w2 · 1 < 0 =⇒ w1 + w2 < −w0 The fourth condition contradicts conditions 2 and 3 Hence we cannot have a solution to this set of inequalities x1 x2 (0, 0) (0, 1) (1, 0) (1, 1) And indeed you can see that it is impossible to draw a line which separates the red points from the blue points Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 331.
    53/1 Most real worlddata is not linearly separable and will always contain some outliers Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 332.
    53/1 Most real worlddata is not linearly separable and will always contain some outliers In fact, sometimes there may not be any out- liers but still the data may not be linearly sep- arable Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 333.
    53/1 + + + + + + + + + + + ++++ + + + o o o o o o o o o o o o o o o o o o o o o oooo o oo oooo o o o o Most real world data is not linearly separable and will always contain some outliers In fact, sometimes there may not be any out- liers but still the data may not be linearly sep- arable We need computational units (models) which can deal with such data Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 334.
    53/1 + + + + + + + + + + + ++++ + + + o o o o o o o o o o o o o o o o o o o o o oooo o oo oooo o o o o Most real world data is not linearly separable and will always contain some outliers In fact, sometimes there may not be any out- liers but still the data may not be linearly sep- arable We need computational units (models) which can deal with such data While a single perceptron cannot deal with such data, we will show that a network of per- ceptrons can indeed deal with such data Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 335.
    54/1 Before seeing howa network of perceptrons can deal with linearly inseparable data, we will discuss boolean functions in some more detail ... Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 336.
    55/1 How many booleanfunctions can you design from 2 inputs ? Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 337.
    55/1 How many booleanfunctions can you design from 2 inputs ? Let us begin with some easy ones which you already know .. Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 338.
    55/1 How many booleanfunctions can you design from 2 inputs ? Let us begin with some easy ones which you already know .. x1 x2 0 0 0 1 1 0 1 1 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 339.
    55/1 How many booleanfunctions can you design from 2 inputs ? Let us begin with some easy ones which you already know .. x1 x2 f1 0 0 0 0 1 0 1 0 0 1 1 0 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 340.
    55/1 How many booleanfunctions can you design from 2 inputs ? Let us begin with some easy ones which you already know .. x1 x2 f1 f16 0 0 0 1 0 1 0 1 1 0 0 1 1 1 0 1 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 341.
    55/1 How many booleanfunctions can you design from 2 inputs ? Let us begin with some easy ones which you already know .. x1 x2 f1 f2 f16 0 0 0 0 1 0 1 0 0 1 1 0 0 0 1 1 1 0 1 1 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 342.
    55/1 How many booleanfunctions can you design from 2 inputs ? Let us begin with some easy ones which you already know .. x1 x2 f1 f2 f8 f16 0 0 0 0 0 1 0 1 0 0 1 1 1 0 0 0 1 1 1 1 0 1 1 1 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 343.
    55/1 How many booleanfunctions can you design from 2 inputs ? Let us begin with some easy ones which you already know .. x1 x2 f1 f2 f3 f8 f16 0 0 0 0 0 0 1 0 1 0 0 0 1 1 1 0 0 0 1 1 1 1 1 0 1 0 1 1 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 344.
    55/1 How many booleanfunctions can you design from 2 inputs ? Let us begin with some easy ones which you already know .. x1 x2 f1 f2 f3 f4 f8 f16 0 0 0 0 0 0 0 1 0 1 0 0 0 0 1 1 1 0 0 0 1 1 1 1 1 1 0 1 0 1 1 1 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 345.
    55/1 How many booleanfunctions can you design from 2 inputs ? Let us begin with some easy ones which you already know .. x1 x2 f1 f2 f3 f4 f5 f8 f16 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 1 1 1 1 0 0 0 1 1 0 1 1 1 1 0 1 0 1 0 1 1 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 346.
    55/1 How many booleanfunctions can you design from 2 inputs ? Let us begin with some easy ones which you already know .. x1 x2 f1 f2 f3 f4 f5 f6 f8 f16 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 1 1 1 1 1 0 0 0 1 1 0 0 1 1 1 1 0 1 0 1 0 1 1 1 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 347.
    55/1 How many booleanfunctions can you design from 2 inputs ? Let us begin with some easy ones which you already know .. x1 x2 f1 f2 f3 f4 f5 f6 f7 f8 f16 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 1 1 1 1 1 1 0 0 0 1 1 0 0 1 1 1 1 1 0 1 0 1 0 1 0 1 1 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 348.
    55/1 How many booleanfunctions can you design from 2 inputs ? Let us begin with some easy ones which you already know .. x1 x2 f1 f2 f3 f4 f5 f6 f7 f8 f9 f16 0 0 0 0 0 0 0 0 0 0 1 1 0 1 0 0 0 0 1 1 1 1 0 1 1 0 0 0 1 1 0 0 1 1 0 1 1 1 0 1 0 1 0 1 0 1 0 1 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 349.
    55/1 How many booleanfunctions can you design from 2 inputs ? Let us begin with some easy ones which you already know .. x1 x2 f1 f2 f3 f4 f5 f6 f7 f8 f9 f10 f16 0 0 0 0 0 0 0 0 0 0 1 1 1 0 1 0 0 0 0 1 1 1 1 0 0 1 1 0 0 0 1 1 0 0 1 1 0 0 1 1 1 0 1 0 1 0 1 0 1 0 1 1 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 350.
    55/1 How many booleanfunctions can you design from 2 inputs ? Let us begin with some easy ones which you already know .. x1 x2 f1 f2 f3 f4 f5 f6 f7 f8 f9 f10 f11 f16 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 1 0 0 0 0 1 1 1 1 0 0 0 1 1 0 0 0 1 1 0 0 1 1 0 0 1 1 1 1 0 1 0 1 0 1 0 1 0 1 0 1 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 351.
    55/1 How many booleanfunctions can you design from 2 inputs ? Let us begin with some easy ones which you already know .. x1 x2 f1 f2 f3 f4 f5 f6 f7 f8 f9 f10 f11 f12 f16 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 0 1 0 0 0 0 1 1 1 1 0 0 0 0 1 1 0 0 0 1 1 0 0 1 1 0 0 1 1 1 1 1 0 1 0 1 0 1 0 1 0 1 0 1 1 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 352.
    55/1 How many booleanfunctions can you design from 2 inputs ? Let us begin with some easy ones which you already know .. x1 x2 f1 f2 f3 f4 f5 f6 f7 f8 f9 f10 f11 f12 f13 f16 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 0 1 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 0 0 0 1 1 0 0 1 1 0 0 1 1 0 1 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 353.
    55/1 How many booleanfunctions can you design from 2 inputs ? Let us begin with some easy ones which you already know .. x1 x2 f1 f2 f3 f4 f5 f6 f7 f8 f9 f10 f11 f12 f13 f14 f16 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 0 1 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 0 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 1 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 354.
    55/1 How many booleanfunctions can you design from 2 inputs ? Let us begin with some easy ones which you already know .. x1 x2 f1 f2 f3 f4 f5 f6 f7 f8 f9 f10 f11 f12 f13 f14 f15 f16 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 0 1 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 1 0 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 355.
    55/1 How many booleanfunctions can you design from 2 inputs ? Let us begin with some easy ones which you already know .. x1 x2 f1 f2 f3 f4 f5 f6 f7 f8 f9 f10 f11 f12 f13 f14 f15 f16 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 0 1 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 1 0 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 Of these, how many are linearly separable ? Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 356.
    55/1 How many booleanfunctions can you design from 2 inputs ? Let us begin with some easy ones which you already know .. x1 x2 f1 f2 f3 f4 f5 f6 f7 f8 f9 f10 f11 f12 f13 f14 f15 f16 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 0 1 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 1 0 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 Of these, how many are linearly separable ? (turns out all except XOR and !XOR - feel free to verify) Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 357.
    55/1 How many booleanfunctions can you design from 2 inputs ? Let us begin with some easy ones which you already know .. x1 x2 f1 f2 f3 f4 f5 f6 f7 f8 f9 f10 f11 f12 f13 f14 f15 f16 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 0 1 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 1 0 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 Of these, how many are linearly separable ? (turns out all except XOR and !XOR - feel free to verify) In general, how many boolean functions can you have for n inputs ? Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 358.
    55/1 How many booleanfunctions can you design from 2 inputs ? Let us begin with some easy ones which you already know .. x1 x2 f1 f2 f3 f4 f5 f6 f7 f8 f9 f10 f11 f12 f13 f14 f15 f16 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 0 1 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 1 0 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 Of these, how many are linearly separable ? (turns out all except XOR and !XOR - feel free to verify) In general, how many boolean functions can you have for n inputs ? 22n Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 359.
    55/1 How many booleanfunctions can you design from 2 inputs ? Let us begin with some easy ones which you already know .. x1 x2 f1 f2 f3 f4 f5 f6 f7 f8 f9 f10 f11 f12 f13 f14 f15 f16 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 0 1 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 1 0 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 Of these, how many are linearly separable ? (turns out all except XOR and !XOR - feel free to verify) In general, how many boolean functions can you have for n inputs ? 22n How many of these 22n functions are not linearly separable ? Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 360.
    55/1 How many booleanfunctions can you design from 2 inputs ? Let us begin with some easy ones which you already know .. x1 x2 f1 f2 f3 f4 f5 f6 f7 f8 f9 f10 f11 f12 f13 f14 f15 f16 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 0 1 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 1 0 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 Of these, how many are linearly separable ? (turns out all except XOR and !XOR - feel free to verify) In general, how many boolean functions can you have for n inputs ? 22n How many of these 22n functions are not linearly separable ? For the time being, it suffices to know that at least some of these may not be linearly inseparable (I encourage you to figure out the exact answer :-) ) Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 361.
    56/1 Module 2.8: RepresentationPower of a Network of Perceptrons Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 362.
    57/1 We will nowsee how to implement any boolean function using a network of perceptrons ... Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 363.
    58/1 For this discussion,we will assume True = +1 and False = -1 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 364.
    58/1 x1 x2 For thisdiscussion, we will assume True = +1 and False = -1 We consider 2 inputs and 4 perceptrons Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 365.
    58/1 x1 x2 For thisdiscussion, we will assume True = +1 and False = -1 We consider 2 inputs and 4 perceptrons Each input is connected to all the 4 per- ceptrons with specific weights Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 366.
    58/1 x1 x2 red edgeindicates w = -1 blue edge indicates w = +1 For this discussion, we will assume True = +1 and False = -1 We consider 2 inputs and 4 perceptrons Each input is connected to all the 4 per- ceptrons with specific weights Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 367.
    58/1 x1 x2 red edgeindicates w = -1 blue edge indicates w = +1 For this discussion, we will assume True = +1 and False = -1 We consider 2 inputs and 4 perceptrons Each input is connected to all the 4 per- ceptrons with specific weights Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 368.
    58/1 x1 x2 red edgeindicates w = -1 blue edge indicates w = +1 For this discussion, we will assume True = +1 and False = -1 We consider 2 inputs and 4 perceptrons Each input is connected to all the 4 per- ceptrons with specific weights Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 369.
    58/1 x1 x2 red edgeindicates w = -1 blue edge indicates w = +1 For this discussion, we will assume True = +1 and False = -1 We consider 2 inputs and 4 perceptrons Each input is connected to all the 4 per- ceptrons with specific weights Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 370.
    58/1 x1 x2 red edgeindicates w = -1 blue edge indicates w = +1 For this discussion, we will assume True = +1 and False = -1 We consider 2 inputs and 4 perceptrons Each input is connected to all the 4 per- ceptrons with specific weights Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 371.
    58/1 x1 x2 bias =-2 rededge indicates w = -1 blue edge indicates w = +1 For this discussion, we will assume True = +1 and False = -1 We consider 2 inputs and 4 perceptrons Each input is connected to all the 4 per- ceptrons with specific weights The bias (w0) of each perceptron is -2 (i.e., each perceptron will fire only if the weighted sum of its input is ≥ 2) Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 372.
    58/1 x1 x2 bias =-2 rededge indicates w = -1 blue edge indicates w = +1 For this discussion, we will assume True = +1 and False = -1 We consider 2 inputs and 4 perceptrons Each input is connected to all the 4 per- ceptrons with specific weights The bias (w0) of each perceptron is -2 (i.e., each perceptron will fire only if the weighted sum of its input is ≥ 2) Each of these perceptrons is connected to an output perceptron by weights (which need to be learned) Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 373.
    58/1 x1 x2 bias =-2 rededge indicates w = -1 blue edge indicates w = +1 For this discussion, we will assume True = +1 and False = -1 We consider 2 inputs and 4 perceptrons Each input is connected to all the 4 per- ceptrons with specific weights The bias (w0) of each perceptron is -2 (i.e., each perceptron will fire only if the weighted sum of its input is ≥ 2) Each of these perceptrons is connected to an output perceptron by weights (which need to be learned) Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 374.
    58/1 x1 x2 bias =-2 w1w2 w3 w4 red edge indicates w = -1 blue edge indicates w = +1 For this discussion, we will assume True = +1 and False = -1 We consider 2 inputs and 4 perceptrons Each input is connected to all the 4 per- ceptrons with specific weights The bias (w0) of each perceptron is -2 (i.e., each perceptron will fire only if the weighted sum of its input is ≥ 2) Each of these perceptrons is connected to an output perceptron by weights (which need to be learned) Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 375.
    58/1 x1 x2 bias =-2 y w1w2 w3 w4 red edge indicates w = -1 blue edge indicates w = +1 For this discussion, we will assume True = +1 and False = -1 We consider 2 inputs and 4 perceptrons Each input is connected to all the 4 per- ceptrons with specific weights The bias (w0) of each perceptron is -2 (i.e., each perceptron will fire only if the weighted sum of its input is ≥ 2) Each of these perceptrons is connected to an output perceptron by weights (which need to be learned) The output of this perceptron (y) is the output of this network Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 376.
    59/1 x1 x2 bias =-2 y w1w2 w3 w4 red edge indicates w = -1 blue edge indicates w = +1 Terminology: This network contains 3 layers Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 377.
    59/1 x1 x2 bias =-2 y w1w2 w3 w4 red edge indicates w = -1 blue edge indicates w = +1 Terminology: This network contains 3 layers The layer containing the inputs (x1, x2) is called the input layer Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 378.
    59/1 x1 x2 bias =-2 y w1w2 w3 w4 red edge indicates w = -1 blue edge indicates w = +1 Terminology: This network contains 3 layers The layer containing the inputs (x1, x2) is called the input layer The middle layer containing the 4 per- ceptrons is called the hidden layer Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 379.
    59/1 x1 x2 bias =-2 y w1w2 w3 w4 red edge indicates w = -1 blue edge indicates w = +1 Terminology: This network contains 3 layers The layer containing the inputs (x1, x2) is called the input layer The middle layer containing the 4 per- ceptrons is called the hidden layer The final layer containing one output neuron is called the output layer Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 380.
    59/1 x1 x2 h1 h2h3 h4 bias =-2 y w1 w2 w3 w4 red edge indicates w = -1 blue edge indicates w = +1 Terminology: This network contains 3 layers The layer containing the inputs (x1, x2) is called the input layer The middle layer containing the 4 per- ceptrons is called the hidden layer The final layer containing one output neuron is called the output layer The outputs of the 4 perceptrons in the hidden layer are denoted by h1, h2, h3, h4 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 381.
    59/1 x1 x2 h1 h2h3 h4 bias =-2 y w1 w2 w3 w4 red edge indicates w = -1 blue edge indicates w = +1 Terminology: This network contains 3 layers The layer containing the inputs (x1, x2) is called the input layer The middle layer containing the 4 per- ceptrons is called the hidden layer The final layer containing one output neuron is called the output layer The outputs of the 4 perceptrons in the hidden layer are denoted by h1, h2, h3, h4 The red and blue edges are called layer 1 weights Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 382.
    59/1 x1 x2 h1 h2h3 h4 bias =-2 y w1 w2 w3 w4 red edge indicates w = -1 blue edge indicates w = +1 Terminology: This network contains 3 layers The layer containing the inputs (x1, x2) is called the input layer The middle layer containing the 4 per- ceptrons is called the hidden layer The final layer containing one output neuron is called the output layer The outputs of the 4 perceptrons in the hidden layer are denoted by h1, h2, h3, h4 The red and blue edges are called layer 1 weights w1, w2, w3, w4 are called layer 2 weights Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 383.
    60/1 x1 x2 h1 h2h3 h4 bias =-2 y w1 w2 w3 w4 red edge indicates w = -1 blue edge indicates w = +1 We claim that this network can be used to implement any boolean function (linearly separable or not) ! Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 384.
    60/1 x1 x2 h1 h2h3 h4 bias =-2 y w1 w2 w3 w4 red edge indicates w = -1 blue edge indicates w = +1 We claim that this network can be used to implement any boolean function (linearly separable or not) ! In other words, we can find w1, w2, w3, w4 such that the truth table of any boolean function can be represented by this net- work Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 385.
    60/1 x1 x2 h1 h2h3 h4 bias =-2 y w1 w2 w3 w4 red edge indicates w = -1 blue edge indicates w = +1 We claim that this network can be used to implement any boolean function (linearly separable or not) ! In other words, we can find w1, w2, w3, w4 such that the truth table of any boolean function can be represented by this net- work Astonishing claim! Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 386.
    60/1 x1 x2 h1 h2h3 h4 bias =-2 y w1 w2 w3 w4 red edge indicates w = -1 blue edge indicates w = +1 We claim that this network can be used to implement any boolean function (linearly separable or not) ! In other words, we can find w1, w2, w3, w4 such that the truth table of any boolean function can be represented by this net- work Astonishing claim! Well, not really, if you understand what is going on Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 387.
    60/1 x1 x2 h1 h2h3 h4 bias =-2 y w1 w2 w3 w4 red edge indicates w = -1 blue edge indicates w = +1 We claim that this network can be used to implement any boolean function (linearly separable or not) ! In other words, we can find w1, w2, w3, w4 such that the truth table of any boolean function can be represented by this net- work Astonishing claim! Well, not really, if you understand what is going on Each perceptron in the middle layer fires only for a specific input (and no two per- ceptrons fire for the same input) Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 388.
    60/1 x1 x2 h1 h2h3 h4 -1,-1 bias =-2 y w1 w2 w3 w4 red edge indicates w = -1 blue edge indicates w = +1 We claim that this network can be used to implement any boolean function (linearly separable or not) ! In other words, we can find w1, w2, w3, w4 such that the truth table of any boolean function can be represented by this net- work Astonishing claim! Well, not really, if you understand what is going on Each perceptron in the middle layer fires only for a specific input (and no two per- ceptrons fire for the same input) the first perceptron fires for {-1,-1} Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 389.
    60/1 x1 x2 h1 h2h3 h4 -1,-1 -1,1 bias =-2 y w1 w2 w3 w4 red edge indicates w = -1 blue edge indicates w = +1 We claim that this network can be used to implement any boolean function (linearly separable or not) ! In other words, we can find w1, w2, w3, w4 such that the truth table of any boolean function can be represented by this net- work Astonishing claim! Well, not really, if you understand what is going on Each perceptron in the middle layer fires only for a specific input (and no two per- ceptrons fire for the same input) the second perceptron fires for {-1,1} Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 390.
    60/1 x1 x2 h1 h2h3 h4 -1,-1 -1,1 1,-1 bias =-2 y w1 w2 w3 w4 red edge indicates w = -1 blue edge indicates w = +1 We claim that this network can be used to implement any boolean function (linearly separable or not) ! In other words, we can find w1, w2, w3, w4 such that the truth table of any boolean function can be represented by this net- work Astonishing claim! Well, not really, if you understand what is going on Each perceptron in the middle layer fires only for a specific input (and no two per- ceptrons fire for the same input) the third perceptron fires for {1,-1} Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 391.
    60/1 x1 x2 h1 h2h3 h4 -1,-1 -1,1 1,-1 1,1 bias =-2 y w1 w2 w3 w4 red edge indicates w = -1 blue edge indicates w = +1 We claim that this network can be used to implement any boolean function (linearly separable or not) ! In other words, we can find w1, w2, w3, w4 such that the truth table of any boolean function can be represented by this net- work Astonishing claim! Well, not really, if you understand what is going on Each perceptron in the middle layer fires only for a specific input (and no two per- ceptrons fire for the same input) the fourth perceptron fires for {1,1} Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 392.
    60/1 x1 x2 h1 h2h3 h4 -1,-1 -1,1 1,-1 1,1 bias =-2 y w1 w2 w3 w4 red edge indicates w = -1 blue edge indicates w = +1 We claim that this network can be used to implement any boolean function (linearly separable or not) ! In other words, we can find w1, w2, w3, w4 such that the truth table of any boolean function can be represented by this net- work Astonishing claim! Well, not really, if you understand what is going on Each perceptron in the middle layer fires only for a specific input (and no two per- ceptrons fire for the same input) Let us see why this network works by tak- ing an example of the XOR function Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 393.
    61/1 x1 x2 h1 h2h3 h4 -1,-1 -1,1 1,-1 1,1 bias =-2 y w1 w2 w3 w4 red edge indicates w = -1 blue edge indicates w = +1 Let w0 be the bias output of the neuron (i.e., it will fire if P4 i=1 wihi ≥ w0) Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 394.
    61/1 x1 x2 h1 h2h3 h4 -1,-1 -1,1 1,-1 1,1 bias =-2 y w1 w2 w3 w4 red edge indicates w = -1 blue edge indicates w = +1 Let w0 be the bias output of the neuron (i.e., it will fire if P4 i=1 wihi ≥ w0) x1 x2 XOR h1 h2 h3 h4 P4 i=1 wihi 0 0 0 1 0 0 0 w1 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 395.
    61/1 x1 x2 h1 h2h3 h4 -1,-1 -1,1 1,-1 1,1 bias =-2 y w1 w2 w3 w4 red edge indicates w = -1 blue edge indicates w = +1 Let w0 be the bias output of the neuron (i.e., it will fire if P4 i=1 wihi ≥ w0) x1 x2 XOR h1 h2 h3 h4 P4 i=1 wihi 0 0 0 1 0 0 0 w1 0 1 1 0 1 0 0 w2 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 396.
    61/1 x1 x2 h1 h2h3 h4 -1,-1 -1,1 1,-1 1,1 bias =-2 y w1 w2 w3 w4 red edge indicates w = -1 blue edge indicates w = +1 Let w0 be the bias output of the neuron (i.e., it will fire if P4 i=1 wihi ≥ w0) x1 x2 XOR h1 h2 h3 h4 P4 i=1 wihi 0 0 0 1 0 0 0 w1 0 1 1 0 1 0 0 w2 1 0 1 0 0 1 0 w3 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 397.
    61/1 x1 x2 h1 h2h3 h4 -1,-1 -1,1 1,-1 1,1 bias =-2 y w1 w2 w3 w4 red edge indicates w = -1 blue edge indicates w = +1 Let w0 be the bias output of the neuron (i.e., it will fire if P4 i=1 wihi ≥ w0) x1 x2 XOR h1 h2 h3 h4 P4 i=1 wihi 0 0 0 1 0 0 0 w1 0 1 1 0 1 0 0 w2 1 0 1 0 0 1 0 w3 1 1 0 0 0 0 1 w4 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 398.
    61/1 x1 x2 h1 h2h3 h4 -1,-1 -1,1 1,-1 1,1 bias =-2 y w1 w2 w3 w4 red edge indicates w = -1 blue edge indicates w = +1 Let w0 be the bias output of the neuron (i.e., it will fire if P4 i=1 wihi ≥ w0) x1 x2 XOR h1 h2 h3 h4 P4 i=1 wihi 0 0 0 1 0 0 0 w1 0 1 1 0 1 0 0 w2 1 0 1 0 0 1 0 w3 1 1 0 0 0 0 1 w4 This results in the following four conditions to implement XOR: w1 < w0, w2 ≥ w0, w3 ≥ w0, w4 < w0 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 399.
    61/1 x1 x2 h1 h2h3 h4 -1,-1 -1,1 1,-1 1,1 bias =-2 y w1 w2 w3 w4 red edge indicates w = -1 blue edge indicates w = +1 Let w0 be the bias output of the neuron (i.e., it will fire if P4 i=1 wihi ≥ w0) x1 x2 XOR h1 h2 h3 h4 P4 i=1 wihi 0 0 0 1 0 0 0 w1 0 1 1 0 1 0 0 w2 1 0 1 0 0 1 0 w3 1 1 0 0 0 0 1 w4 This results in the following four conditions to implement XOR: w1 < w0, w2 ≥ w0, w3 ≥ w0, w4 < w0 Unlike before, there are no contradictions now and the system of inequalities can be satisfied Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 400.
    61/1 x1 x2 h1 h2h3 h4 -1,-1 -1,1 1,-1 1,1 bias =-2 y w1 w2 w3 w4 red edge indicates w = -1 blue edge indicates w = +1 Let w0 be the bias output of the neuron (i.e., it will fire if P4 i=1 wihi ≥ w0) x1 x2 XOR h1 h2 h3 h4 P4 i=1 wihi 0 0 0 1 0 0 0 w1 0 1 1 0 1 0 0 w2 1 0 1 0 0 1 0 w3 1 1 0 0 0 0 1 w4 This results in the following four conditions to implement XOR: w1 < w0, w2 ≥ w0, w3 ≥ w0, w4 < w0 Unlike before, there are no contradictions now and the system of inequalities can be satisfied Essentially each wi is now responsible for one of the 4 possible inputs and can be adjusted to get the desired output for that input Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 401.
    62/1 x1 x2 h1 h2h3 h4 -1,-1 -1,1 1,-1 1,1 bias =-2 y w1 w2 w3 w4 red edge indicates w = -1 blue edge indicates w = +1 It should be clear that the same network can be used to represent the remaining 15 boolean functions also Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 402.
    62/1 x1 x2 h1 h2h3 h4 -1,-1 -1,1 1,-1 1,1 bias =-2 y w1 w2 w3 w4 red edge indicates w = -1 blue edge indicates w = +1 It should be clear that the same network can be used to represent the remaining 15 boolean functions also Each boolean function will result in a dif- ferent set of non-contradicting inequalit- ies which can be satisfied by appropriately setting w1, w2, w3, w4 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 403.
    62/1 x1 x2 h1 h2h3 h4 -1,-1 -1,1 1,-1 1,1 bias =-2 y w1 w2 w3 w4 red edge indicates w = -1 blue edge indicates w = +1 It should be clear that the same network can be used to represent the remaining 15 boolean functions also Each boolean function will result in a dif- ferent set of non-contradicting inequalit- ies which can be satisfied by appropriately setting w1, w2, w3, w4 Try it! Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 404.
    63/1 What if wehave more than 3 inputs ? Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 405.
    64/1 Again each ofthe 8 perceptorns will fire only for one of the 8 inputs Each of the 8 weights in the second layer is responsible for one of the 8 inputs and can be adjusted to produce the desired output for that input x1 x2 x3 bias =-3 y w1 w2 w3 w4 w5 w6 w7 w8 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 406.
    65/1 What if wehave n inputs ? Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 407.
    66/1 Theorem Any boolean functionof n inputs can be represented exactly by a network of perceptrons containing 1 hidden layer with 2n perceptrons and one output layer containing 1 perceptron Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 408.
    66/1 Theorem Any boolean functionof n inputs can be represented exactly by a network of perceptrons containing 1 hidden layer with 2n perceptrons and one output layer containing 1 perceptron Proof (informal:) We just saw how to construct such a network Note: A network of 2n + 1 perceptrons is not necessary but sufficient. For example, we already saw how to represent AND function with just 1 perceptron Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 409.
    66/1 Theorem Any boolean functionof n inputs can be represented exactly by a network of perceptrons containing 1 hidden layer with 2n perceptrons and one output layer containing 1 perceptron Proof (informal:) We just saw how to construct such a network Note: A network of 2n + 1 perceptrons is not necessary but sufficient. For example, we already saw how to represent AND function with just 1 perceptron Catch: As n increases the number of perceptrons in the hidden layers obviously increases exponentially Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 410.
    67/1 Again, why dowe care about boolean functions ? Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 411.
    67/1 Again, why dowe care about boolean functions ? How does this help us with our original problem: which was to predict whether we like a movie or not? Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 412.
    67/1 Again, why dowe care about boolean functions ? How does this help us with our original problem: which was to predict whether we like a movie or not? Let us see! Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 413.
    68/1 p1 p2 . . . n1 n2 . . .           x11 x12 .. . x1n y1 = 1 x21 x22 . . . x2n y2 = 1 . . . . . . . . . . . . . . . xk1 xk2 . . . xkn yi = 0 xj1 xj2 . . . xjn yj = 0 . . . . . . . . . . . . . . .           We are given this data about our past movie experience Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 414.
    68/1 p1 p2 . . . n1 n2 . . .           x11 x12 .. . x1n y1 = 1 x21 x22 . . . x2n y2 = 1 . . . . . . . . . . . . . . . xk1 xk2 . . . xkn yi = 0 xj1 xj2 . . . xjn yj = 0 . . . . . . . . . . . . . . .           We are given this data about our past movie experience For each movie, we are given the values of the various factors (x1, x2, . . . , xn) that we base our decision on and we are also also given the value of y (like/dislike) Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 415.
    68/1 p1 p2 . . . n1 n2 . . .           x11 x12 .. . x1n y1 = 1 x21 x22 . . . x2n y2 = 1 . . . . . . . . . . . . . . . xk1 xk2 . . . xkn yi = 0 xj1 xj2 . . . xjn yj = 0 . . . . . . . . . . . . . . .           We are given this data about our past movie experience For each movie, we are given the values of the various factors (x1, x2, . . . , xn) that we base our decision on and we are also also given the value of y (like/dislike) pi’s are the points for which the output was 1 and ni’s are the points for which it was 0 Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 416.
    68/1 p1 p2 . . . n1 n2 . . .           x11 x12 .. . x1n y1 = 1 x21 x22 . . . x2n y2 = 1 . . . . . . . . . . . . . . . xk1 xk2 . . . xkn yi = 0 xj1 xj2 . . . xjn yj = 0 . . . . . . . . . . . . . . .           We are given this data about our past movie experience For each movie, we are given the values of the various factors (x1, x2, . . . , xn) that we base our decision on and we are also also given the value of y (like/dislike) pi’s are the points for which the output was 1 and ni’s are the points for which it was 0 The data may or may not be linearly separable Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 417.
    68/1 x1 x2 x3 bias=-3 y w1 w2 w3 w4 w5 w6 w7 w8 p1 p2 . . . n1 n2 . . .           x11 x12 . . . x1n y1 = 1 x21 x22 . . . x2n y2 = 1 . . . . . . . . . . . . . . . xk1 xk2 . . . xkn yi = 0 xj1 xj2 . . . xjn yj = 0 . . . . . . . . . . . . . . .           We are given this data about our past movie experience For each movie, we are given the values of the various factors (x1, x2, . . . , xn) that we base our decision on and we are also also given the value of y (like/dislike) pi’s are the points for which the output was 1 and ni’s are the points for which it was 0 The data may or may not be linearly separable The proof that we just saw tells us that it is possible to have a network of perceptrons and learn the weights in this network such that for any given pi or nj the output of the network will be the same as yi or yj (i.e., we can sep- arate the positive and the negative points) Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 418.
    69/1 The story sofar ... Networks of the form that we just saw (containing, an input, output and one or more hidden layers) are called Multilayer Perceptrons (MLP, in short) Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 419.
    69/1 The story sofar ... Networks of the form that we just saw (containing, an input, output and one or more hidden layers) are called Multilayer Perceptrons (MLP, in short) More appropriate terminology would be“Multilayered Network of Perceptrons” but MLP is the more commonly used name Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 420.
    69/1 The story sofar ... Networks of the form that we just saw (containing, an input, output and one or more hidden layers) are called Multilayer Perceptrons (MLP, in short) More appropriate terminology would be“Multilayered Network of Perceptrons” but MLP is the more commonly used name The theorem that we just saw gives us the representation power of a MLP with a single hidden layer Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2
  • 421.
    69/1 The story sofar ... Networks of the form that we just saw (containing, an input, output and one or more hidden layers) are called Multilayer Perceptrons (MLP, in short) More appropriate terminology would be“Multilayered Network of Perceptrons” but MLP is the more commonly used name The theorem that we just saw gives us the representation power of a MLP with a single hidden layer Specifically, it tells us that a MLP with a single hidden layer can represent any boolean function Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2