SlideShare a Scribd company logo
Introduction to Neural Networks and Deep Learning
Introduction to Neural Networks
Andres Mendez-Vazquez
May 5, 2019
1 / 63
Outline
1 What are Neural Networks?
Structure of a Neural Cell
Pigeon Experiment
Formal Definition of Artificial Neural Network
Basic Elements of an Artificial Neuron
Types of Activation Functions
McCulloch-Pitts model
More Advanced Models
The Problem of the Vanishing Gradient
Fixing the Problem, ReLu function
2 Neural Network As a Graph
Introduction
Feedback
Neural Architectures
Knowledge Representation
Design of a Neural Network
Representing Knowledge in a Neural Networks
2 / 63
What are Neural Networks?
Basic Intuition
The human brain is a highly complex, nonlinear and parallel computer
It is organized as a
Network with (Ramon y Cajal 1911)
1 Basic Processing Units ≈ Neurons
2 Connections ≈ Axons and Dendrites
3 / 63
What are Neural Networks?
Basic Intuition
The human brain is a highly complex, nonlinear and parallel computer
It is organized as a
Network with (Ramon y Cajal 1911)
1 Basic Processing Units ≈ Neurons
2 Connections ≈ Axons and Dendrites
3 / 63
What are Neural Networks?
Basic Intuition
The human brain is a highly complex, nonlinear and parallel computer
It is organized as a
Network with (Ramon y Cajal 1911)
1 Basic Processing Units ≈ Neurons
2 Connections ≈ Axons and Dendrites
3 / 63
Outline
1 What are Neural Networks?
Structure of a Neural Cell
Pigeon Experiment
Formal Definition of Artificial Neural Network
Basic Elements of an Artificial Neuron
Types of Activation Functions
McCulloch-Pitts model
More Advanced Models
The Problem of the Vanishing Gradient
Fixing the Problem, ReLu function
2 Neural Network As a Graph
Introduction
Feedback
Neural Architectures
Knowledge Representation
Design of a Neural Network
Representing Knowledge in a Neural Networks
4 / 63
Example
Example
Apical Dentrites
Basal Dentrites
CELL BODY
Dendritic Spines
  SYNAPTIC
INPUTS
AXON
5 / 63
Silicon Chip Vs Neurons
Speed Differential
1 Speeds in silicon chips are in the nanosecond range (10−9 s).
2 Speeds in human neural networks are in the millisecond range (10−3
s).
However
We have massive parallelism on the human brain
1 10 billion neurons in the human cortex.
2 60 trillion synapses or connections
High Energy Efficiency
1 Human Brain uses 10−16 joules per operation.
2 Best computers use 10−6 joules per operation.
6 / 63
Silicon Chip Vs Neurons
Speed Differential
1 Speeds in silicon chips are in the nanosecond range (10−9 s).
2 Speeds in human neural networks are in the millisecond range (10−3
s).
However
We have massive parallelism on the human brain
1 10 billion neurons in the human cortex.
2 60 trillion synapses or connections
High Energy Efficiency
1 Human Brain uses 10−16 joules per operation.
2 Best computers use 10−6 joules per operation.
6 / 63
Silicon Chip Vs Neurons
Speed Differential
1 Speeds in silicon chips are in the nanosecond range (10−9 s).
2 Speeds in human neural networks are in the millisecond range (10−3
s).
However
We have massive parallelism on the human brain
1 10 billion neurons in the human cortex.
2 60 trillion synapses or connections
High Energy Efficiency
1 Human Brain uses 10−16 joules per operation.
2 Best computers use 10−6 joules per operation.
6 / 63
Silicon Chip Vs Neurons
Speed Differential
1 Speeds in silicon chips are in the nanosecond range (10−9 s).
2 Speeds in human neural networks are in the millisecond range (10−3
s).
However
We have massive parallelism on the human brain
1 10 billion neurons in the human cortex.
2 60 trillion synapses or connections
High Energy Efficiency
1 Human Brain uses 10−16 joules per operation.
2 Best computers use 10−6 joules per operation.
6 / 63
Silicon Chip Vs Neurons
Speed Differential
1 Speeds in silicon chips are in the nanosecond range (10−9 s).
2 Speeds in human neural networks are in the millisecond range (10−3
s).
However
We have massive parallelism on the human brain
1 10 billion neurons in the human cortex.
2 60 trillion synapses or connections
High Energy Efficiency
1 Human Brain uses 10−16 joules per operation.
2 Best computers use 10−6 joules per operation.
6 / 63
Silicon Chip Vs Neurons
Speed Differential
1 Speeds in silicon chips are in the nanosecond range (10−9 s).
2 Speeds in human neural networks are in the millisecond range (10−3
s).
However
We have massive parallelism on the human brain
1 10 billion neurons in the human cortex.
2 60 trillion synapses or connections
High Energy Efficiency
1 Human Brain uses 10−16 joules per operation.
2 Best computers use 10−6 joules per operation.
6 / 63
Silicon Chip Vs Neurons
Speed Differential
1 Speeds in silicon chips are in the nanosecond range (10−9 s).
2 Speeds in human neural networks are in the millisecond range (10−3
s).
However
We have massive parallelism on the human brain
1 10 billion neurons in the human cortex.
2 60 trillion synapses or connections
High Energy Efficiency
1 Human Brain uses 10−16 joules per operation.
2 Best computers use 10−6 joules per operation.
6 / 63
Outline
1 What are Neural Networks?
Structure of a Neural Cell
Pigeon Experiment
Formal Definition of Artificial Neural Network
Basic Elements of an Artificial Neuron
Types of Activation Functions
McCulloch-Pitts model
More Advanced Models
The Problem of the Vanishing Gradient
Fixing the Problem, ReLu function
2 Neural Network As a Graph
Introduction
Feedback
Neural Architectures
Knowledge Representation
Design of a Neural Network
Representing Knowledge in a Neural Networks
7 / 63
Pigeon Experiment
Watanabe et al. 1995
Pigeons as art experts
Experiment
Pigeon is in a Skinner box
Then, paintings of two different artists (e.g. Chagall / Van Gogh) are
presented to it.
A Reward is given for pecking when presented a particular artist (e.g.
Van Gogh).
8 / 63
Pigeon Experiment
Watanabe et al. 1995
Pigeons as art experts
Experiment
Pigeon is in a Skinner box
Then, paintings of two different artists (e.g. Chagall / Van Gogh) are
presented to it.
A Reward is given for pecking when presented a particular artist (e.g.
Van Gogh).
8 / 63
Pigeon Experiment
Watanabe et al. 1995
Pigeons as art experts
Experiment
Pigeon is in a Skinner box
Then, paintings of two different artists (e.g. Chagall / Van Gogh) are
presented to it.
A Reward is given for pecking when presented a particular artist (e.g.
Van Gogh).
8 / 63
The Pigeon in the Skinner Box
Something like this
9 / 63
Results
Something Notable
Pigeons were able to discriminate between Van Gogh and Chagall
with 95% accuracy (when presented with pictures they had been
trained on).
Discrimination still 85% successful for previously unseen paintings of
the artists.
Thus
1 Pigeons do not simply memorize the pictures.
2 They can extract and recognize patterns (the ‘style’).
3 They generalize from the already seen to make predictions.
4 This is what neural networks (biological and artificial) are good at
(unlike conventional computer).
10 / 63
Results
Something Notable
Pigeons were able to discriminate between Van Gogh and Chagall
with 95% accuracy (when presented with pictures they had been
trained on).
Discrimination still 85% successful for previously unseen paintings of
the artists.
Thus
1 Pigeons do not simply memorize the pictures.
2 They can extract and recognize patterns (the ‘style’).
3 They generalize from the already seen to make predictions.
4 This is what neural networks (biological and artificial) are good at
(unlike conventional computer).
10 / 63
Results
Something Notable
Pigeons were able to discriminate between Van Gogh and Chagall
with 95% accuracy (when presented with pictures they had been
trained on).
Discrimination still 85% successful for previously unseen paintings of
the artists.
Thus
1 Pigeons do not simply memorize the pictures.
2 They can extract and recognize patterns (the ‘style’).
3 They generalize from the already seen to make predictions.
4 This is what neural networks (biological and artificial) are good at
(unlike conventional computer).
10 / 63
Results
Something Notable
Pigeons were able to discriminate between Van Gogh and Chagall
with 95% accuracy (when presented with pictures they had been
trained on).
Discrimination still 85% successful for previously unseen paintings of
the artists.
Thus
1 Pigeons do not simply memorize the pictures.
2 They can extract and recognize patterns (the ‘style’).
3 They generalize from the already seen to make predictions.
4 This is what neural networks (biological and artificial) are good at
(unlike conventional computer).
10 / 63
Results
Something Notable
Pigeons were able to discriminate between Van Gogh and Chagall
with 95% accuracy (when presented with pictures they had been
trained on).
Discrimination still 85% successful for previously unseen paintings of
the artists.
Thus
1 Pigeons do not simply memorize the pictures.
2 They can extract and recognize patterns (the ‘style’).
3 They generalize from the already seen to make predictions.
4 This is what neural networks (biological and artificial) are good at
(unlike conventional computer).
10 / 63
Results
Something Notable
Pigeons were able to discriminate between Van Gogh and Chagall
with 95% accuracy (when presented with pictures they had been
trained on).
Discrimination still 85% successful for previously unseen paintings of
the artists.
Thus
1 Pigeons do not simply memorize the pictures.
2 They can extract and recognize patterns (the ‘style’).
3 They generalize from the already seen to make predictions.
4 This is what neural networks (biological and artificial) are good at
(unlike conventional computer).
10 / 63
Outline
1 What are Neural Networks?
Structure of a Neural Cell
Pigeon Experiment
Formal Definition of Artificial Neural Network
Basic Elements of an Artificial Neuron
Types of Activation Functions
McCulloch-Pitts model
More Advanced Models
The Problem of the Vanishing Gradient
Fixing the Problem, ReLu function
2 Neural Network As a Graph
Introduction
Feedback
Neural Architectures
Knowledge Representation
Design of a Neural Network
Representing Knowledge in a Neural Networks
11 / 63
Formal Definition
Definition
An artificial neural network is a massively parallel distributed processor made up of
simple processing units. It resembles the brain in two respects:
1 Knowledge is acquired by the network from its environment through a learning
process.
2 Inter-neuron connection strengths, known as synaptic weights, are used to store
the acquired knowledge.
12 / 63
Formal Definition
Definition
An artificial neural network is a massively parallel distributed processor made up of
simple processing units. It resembles the brain in two respects:
1 Knowledge is acquired by the network from its environment through a learning
process.
2 Inter-neuron connection strengths, known as synaptic weights, are used to store
the acquired knowledge.
12 / 63
Formal Definition
Definition
An artificial neural network is a massively parallel distributed processor made up of
simple processing units. It resembles the brain in two respects:
1 Knowledge is acquired by the network from its environment through a learning
process.
2 Inter-neuron connection strengths, known as synaptic weights, are used to store
the acquired knowledge.
12 / 63
Inter-neuron connection strengths?
How do the neuron collect this information?
Some way to aggregate information needs to be devised...
A Classic
Use a summation of product of weights by inputs!!!
Something like
m
i=1
wi × xi
Where: wi is the strength given to signal xi
However: We still need a way to regulate this “aggregation”
(Activation function)
13 / 63
Inter-neuron connection strengths?
How do the neuron collect this information?
Some way to aggregate information needs to be devised...
A Classic
Use a summation of product of weights by inputs!!!
Something like
m
i=1
wi × xi
Where: wi is the strength given to signal xi
However: We still need a way to regulate this “aggregation”
(Activation function)
13 / 63
Inter-neuron connection strengths?
How do the neuron collect this information?
Some way to aggregate information needs to be devised...
A Classic
Use a summation of product of weights by inputs!!!
Something like
m
i=1
wi × xi
Where: wi is the strength given to signal xi
However: We still need a way to regulate this “aggregation”
(Activation function)
13 / 63
The Model of a Artificial Neuron
Graphical Representation for neuron k
INPUT
SIGNALS
SYNAPTIC
WEIGHTS
BIAS
SUMMING
JUNCTION
OUTPUT
ACTIVATION
FUNCTION
14 / 63
Outline
1 What are Neural Networks?
Structure of a Neural Cell
Pigeon Experiment
Formal Definition of Artificial Neural Network
Basic Elements of an Artificial Neuron
Types of Activation Functions
McCulloch-Pitts model
More Advanced Models
The Problem of the Vanishing Gradient
Fixing the Problem, ReLu function
2 Neural Network As a Graph
Introduction
Feedback
Neural Architectures
Knowledge Representation
Design of a Neural Network
Representing Knowledge in a Neural Networks
15 / 63
Basic Elements of an Artificial Neuron (AN) Model
Set of Connecting links
A signal xj, at the input of synapse j connected to neuron k is multiplied by the
synaptic weight wkj.
The weight of an AN may lie in a negative of positive range.
What about the real neuron? In classic literature you only have positive
values.
Adder
An adder for summing the input signals, weighted by the respective synapses of the
neuron.
Activation function (Squashing function)
It limits the amplitude of the output of a neuron.
It maps the permissible range of the output signal to an interval.
16 / 63
Basic Elements of an Artificial Neuron (AN) Model
Set of Connecting links
A signal xj, at the input of synapse j connected to neuron k is multiplied by the
synaptic weight wkj.
The weight of an AN may lie in a negative of positive range.
What about the real neuron? In classic literature you only have positive
values.
Adder
An adder for summing the input signals, weighted by the respective synapses of the
neuron.
Activation function (Squashing function)
It limits the amplitude of the output of a neuron.
It maps the permissible range of the output signal to an interval.
16 / 63
Basic Elements of an Artificial Neuron (AN) Model
Set of Connecting links
A signal xj, at the input of synapse j connected to neuron k is multiplied by the
synaptic weight wkj.
The weight of an AN may lie in a negative of positive range.
What about the real neuron? In classic literature you only have positive
values.
Adder
An adder for summing the input signals, weighted by the respective synapses of the
neuron.
Activation function (Squashing function)
It limits the amplitude of the output of a neuron.
It maps the permissible range of the output signal to an interval.
16 / 63
Basic Elements of an Artificial Neuron (AN) Model
Set of Connecting links
A signal xj, at the input of synapse j connected to neuron k is multiplied by the
synaptic weight wkj.
The weight of an AN may lie in a negative of positive range.
What about the real neuron? In classic literature you only have positive
values.
Adder
An adder for summing the input signals, weighted by the respective synapses of the
neuron.
Activation function (Squashing function)
It limits the amplitude of the output of a neuron.
It maps the permissible range of the output signal to an interval.
16 / 63
Mathematically
Adder
uk =
m
j=1
wkjxj (1)
1 x1, x2, ..., xm are the input signals.
2 wk1, wk2, ..., wkm are the synaptic weights.
3 It is also known as “Affine Transformation.”
Activation function
yk = ϕ (uk + bk) (2)
1 yk output of neuron.
2 ϕ is the activation function.
17 / 63
Mathematically
Adder
uk =
m
j=1
wkjxj (1)
1 x1, x2, ..., xm are the input signals.
2 wk1, wk2, ..., wkm are the synaptic weights.
3 It is also known as “Affine Transformation.”
Activation function
yk = ϕ (uk + bk) (2)
1 yk output of neuron.
2 ϕ is the activation function.
17 / 63
Mathematically
Adder
uk =
m
j=1
wkjxj (1)
1 x1, x2, ..., xm are the input signals.
2 wk1, wk2, ..., wkm are the synaptic weights.
3 It is also known as “Affine Transformation.”
Activation function
yk = ϕ (uk + bk) (2)
1 yk output of neuron.
2 ϕ is the activation function.
17 / 63
Mathematically
Adder
uk =
m
j=1
wkjxj (1)
1 x1, x2, ..., xm are the input signals.
2 wk1, wk2, ..., wkm are the synaptic weights.
3 It is also known as “Affine Transformation.”
Activation function
yk = ϕ (uk + bk) (2)
1 yk output of neuron.
2 ϕ is the activation function.
17 / 63
Mathematically
Adder
uk =
m
j=1
wkjxj (1)
1 x1, x2, ..., xm are the input signals.
2 wk1, wk2, ..., wkm are the synaptic weights.
3 It is also known as “Affine Transformation.”
Activation function
yk = ϕ (uk + bk) (2)
1 yk output of neuron.
2 ϕ is the activation function.
17 / 63
Mathematically
Adder
uk =
m
j=1
wkjxj (1)
1 x1, x2, ..., xm are the input signals.
2 wk1, wk2, ..., wkm are the synaptic weights.
3 It is also known as “Affine Transformation.”
Activation function
yk = ϕ (uk + bk) (2)
1 yk output of neuron.
2 ϕ is the activation function.
17 / 63
Mathematically
Adder
uk =
m
j=1
wkjxj (1)
1 x1, x2, ..., xm are the input signals.
2 wk1, wk2, ..., wkm are the synaptic weights.
3 It is also known as “Affine Transformation.”
Activation function
yk = ϕ (uk + bk) (2)
1 yk output of neuron.
2 ϕ is the activation function.
17 / 63
Integrating the Bias
The Affine Transformation
0
Induced Local Field,
Bias
Linear Combiner's Output,
18 / 63
Thus
Final Equation
vk =
m
j=0
wkjxj
yk = ϕ (vk)
With
x0 = 1 and wk0 = bk
19 / 63
Thus
Final Equation
vk =
m
j=0
wkjxj
yk = ϕ (vk)
With
x0 = 1 and wk0 = bk
19 / 63
Outline
1 What are Neural Networks?
Structure of a Neural Cell
Pigeon Experiment
Formal Definition of Artificial Neural Network
Basic Elements of an Artificial Neuron
Types of Activation Functions
McCulloch-Pitts model
More Advanced Models
The Problem of the Vanishing Gradient
Fixing the Problem, ReLu function
2 Neural Network As a Graph
Introduction
Feedback
Neural Architectures
Knowledge Representation
Design of a Neural Network
Representing Knowledge in a Neural Networks
20 / 63
Types of Activation Functions I
Threshold Function
ϕ (v) =
1 if v ≥ 0
0 if v < 0
(Heaviside Function) (3)
In the following picture
21 / 63
Types of Activation Functions I
Threshold Function
ϕ (v) =
1 if v ≥ 0
0 if v < 0
(Heaviside Function) (3)
In the following picture
1
-1-2-3-4 10 2 3
0.5
21 / 63
Thus
We can use this activation function
To generate the first Neural Network Model
Clearly
The model uses the summation as aggregation operator and a threshold
function.
22 / 63
Thus
We can use this activation function
To generate the first Neural Network Model
Clearly
The model uses the summation as aggregation operator and a threshold
function.
22 / 63
Outline
1 What are Neural Networks?
Structure of a Neural Cell
Pigeon Experiment
Formal Definition of Artificial Neural Network
Basic Elements of an Artificial Neuron
Types of Activation Functions
McCulloch-Pitts model
More Advanced Models
The Problem of the Vanishing Gradient
Fixing the Problem, ReLu function
2 Neural Network As a Graph
Introduction
Feedback
Neural Architectures
Knowledge Representation
Design of a Neural Network
Representing Knowledge in a Neural Networks
23 / 63
McCulloch-Pitts model
McCulloch-Pitts model (Pioneers of Neural Networks in the 1940’s)
Output
Output yk =
1 if vk ≥ θ
0 if vk < θ
(4)
yk =
1 if vk ≤ θ = 0
0 if vk > θ = 0
with induced local field wk = (1, 1)T
vk =
m
j=1
wkjxj + bk (5)
24 / 63
McCulloch-Pitts model
McCulloch-Pitts model (Pioneers of Neural Networks in the 1940’s)
Output
Output yk =
1 if vk ≥ θ
0 if vk < θ
(4)
yk =
1 if vk ≤ θ = 0
0 if vk > θ = 0
with induced local field wk = (1, 1)T
vk =
m
j=1
wkjxj + bk (5)
24 / 63
It is possible to do classic operations in Boolean Algebra
AND Gate
25 / 63
In the other hand
OR Gate
26 / 63
Finally
NOT Gate
27 / 63
And the impact is further understood if you look at this
paper
Claude Shannon
“A Symbolic Analysis of Relay and Switching Circuits”
Shannon proved that his switching circuits could be used to simplify
the arrangement of the electromechanical relays
These circuits could solve all problems that Boolean algebra could
solve.
Basically, he proved that computer circuits
They can solve computational complex problems...
28 / 63
And the impact is further understood if you look at this
paper
Claude Shannon
“A Symbolic Analysis of Relay and Switching Circuits”
Shannon proved that his switching circuits could be used to simplify
the arrangement of the electromechanical relays
These circuits could solve all problems that Boolean algebra could
solve.
Basically, he proved that computer circuits
They can solve computational complex problems...
28 / 63
And the impact is further understood if you look at this
paper
Claude Shannon
“A Symbolic Analysis of Relay and Switching Circuits”
Shannon proved that his switching circuits could be used to simplify
the arrangement of the electromechanical relays
These circuits could solve all problems that Boolean algebra could
solve.
Basically, he proved that computer circuits
They can solve computational complex problems...
28 / 63
And the impact is further understood if you look at this
paper
Claude Shannon
“A Symbolic Analysis of Relay and Switching Circuits”
Shannon proved that his switching circuits could be used to simplify
the arrangement of the electromechanical relays
These circuits could solve all problems that Boolean algebra could
solve.
Basically, he proved that computer circuits
They can solve computational complex problems...
28 / 63
Outline
1 What are Neural Networks?
Structure of a Neural Cell
Pigeon Experiment
Formal Definition of Artificial Neural Network
Basic Elements of an Artificial Neuron
Types of Activation Functions
McCulloch-Pitts model
More Advanced Models
The Problem of the Vanishing Gradient
Fixing the Problem, ReLu function
2 Neural Network As a Graph
Introduction
Feedback
Neural Architectures
Knowledge Representation
Design of a Neural Network
Representing Knowledge in a Neural Networks
29 / 63
More advanced activation function
Piecewise-Linear Function
ϕ (v) =



1 if vk ≥ 1
2
v if − 1
2 < vk < 1
2
0 if v ≤ −1
2
(6)
Example
30 / 63
More advanced activation function
Piecewise-Linear Function
ϕ (v) =



1 if vk ≥ 1
2
v if − 1
2 < vk < 1
2
0 if v ≤ −1
2
(6)
Example
1
-1-2-3-4 10 2 3
0.5
-0.5 0.5
30 / 63
Remarks
Notes about Piecewise-Linear function
The amplification factor inside the linear region of operation is assumed to
be unity.
Special Cases
A linear combiner arises if the linear region of operation is maintained
without running into saturation.
The piecewise-linear function reduces to a threshold function if the
amplification factor of the linear region is made infinitely large.
31 / 63
Remarks
Notes about Piecewise-Linear function
The amplification factor inside the linear region of operation is assumed to
be unity.
Special Cases
A linear combiner arises if the linear region of operation is maintained
without running into saturation.
The piecewise-linear function reduces to a threshold function if the
amplification factor of the linear region is made infinitely large.
31 / 63
Remarks
Notes about Piecewise-Linear function
The amplification factor inside the linear region of operation is assumed to
be unity.
Special Cases
A linear combiner arises if the linear region of operation is maintained
without running into saturation.
The piecewise-linear function reduces to a threshold function if the
amplification factor of the linear region is made infinitely large.
31 / 63
A better choice!!!
Sigmoid function
ϕ (v) =
1
1 + exp {−av}
(7)
Where a is a slope parameter.
As a → ∞ then ϕ → Threshold function.
Examples
32 / 63
A better choice!!!
Sigmoid function
ϕ (v) =
1
1 + exp {−av}
(7)
Where a is a slope parameter.
As a → ∞ then ϕ → Threshold function.
Examples
Increasing
32 / 63
Outline
1 What are Neural Networks?
Structure of a Neural Cell
Pigeon Experiment
Formal Definition of Artificial Neural Network
Basic Elements of an Artificial Neuron
Types of Activation Functions
McCulloch-Pitts model
More Advanced Models
The Problem of the Vanishing Gradient
Fixing the Problem, ReLu function
2 Neural Network As a Graph
Introduction
Feedback
Neural Architectures
Knowledge Representation
Design of a Neural Network
Representing Knowledge in a Neural Networks
33 / 63
The Problem of the Vanishing Gradient
When using a non-linearity
However, there is a drawback when using Back-Propagation (As we
saw in Machine Learning) under a sigmoid function
s (x) =
1
1 + e−x
Because if we imagine a Deep Neural Network as a series of layer
functions fi
y (A) = ft ◦ ft−1 ◦ · · · ◦ f2 ◦ f1 (A)
With ft is the last layer.
Therefore, we finish with a sequence of derivatives
∂y (A)
∂w1i
=
∂ft (ft−1)
∂ft−1
·
∂ft−1 (ft−2)
∂ft−2
· · · · ·
∂f2 (f1)
∂f2
·
∂f1 (A)
∂w1i
34 / 63
The Problem of the Vanishing Gradient
When using a non-linearity
However, there is a drawback when using Back-Propagation (As we
saw in Machine Learning) under a sigmoid function
s (x) =
1
1 + e−x
Because if we imagine a Deep Neural Network as a series of layer
functions fi
y (A) = ft ◦ ft−1 ◦ · · · ◦ f2 ◦ f1 (A)
With ft is the last layer.
Therefore, we finish with a sequence of derivatives
∂y (A)
∂w1i
=
∂ft (ft−1)
∂ft−1
·
∂ft−1 (ft−2)
∂ft−2
· · · · ·
∂f2 (f1)
∂f2
·
∂f1 (A)
∂w1i
34 / 63
The Problem of the Vanishing Gradient
When using a non-linearity
However, there is a drawback when using Back-Propagation (As we
saw in Machine Learning) under a sigmoid function
s (x) =
1
1 + e−x
Because if we imagine a Deep Neural Network as a series of layer
functions fi
y (A) = ft ◦ ft−1 ◦ · · · ◦ f2 ◦ f1 (A)
With ft is the last layer.
Therefore, we finish with a sequence of derivatives
∂y (A)
∂w1i
=
∂ft (ft−1)
∂ft−1
·
∂ft−1 (ft−2)
∂ft−2
· · · · ·
∂f2 (f1)
∂f2
·
∂f1 (A)
∂w1i
34 / 63
Therefore
Given the commutativity of the product
You could put together the derivative of the sigmoid’s
f (x) =
ds (x)
dx
=
e−x
(1 + e−x)2
Therefore, deriving again
df (x)
dx
= −
e−x
(1 + e−x)2 +
2 (e−x)
2
(1 + e−x)3
After making df(x)
dx
= 0
We have the maximum is at x = 0
35 / 63
Therefore
Given the commutativity of the product
You could put together the derivative of the sigmoid’s
f (x) =
ds (x)
dx
=
e−x
(1 + e−x)2
Therefore, deriving again
df (x)
dx
= −
e−x
(1 + e−x)2 +
2 (e−x)
2
(1 + e−x)3
After making df(x)
dx
= 0
We have the maximum is at x = 0
35 / 63
Therefore
Given the commutativity of the product
You could put together the derivative of the sigmoid’s
f (x) =
ds (x)
dx
=
e−x
(1 + e−x)2
Therefore, deriving again
df (x)
dx
= −
e−x
(1 + e−x)2 +
2 (e−x)
2
(1 + e−x)3
After making df(x)
dx
= 0
We have the maximum is at x = 0
35 / 63
Therefore
The maximum for the derivative of the sigmoid
f (0) = 0.25
Therefore, Given a Deep Convolutional Network
We could finish with
lim
k→∞
ds (x)
dx
k
= lim
k→∞
(0.25)k
→ 0
A Vanishing Derivative or Vanishing Gradient
Making quite difficult to do train a deeper network using this
activation function for Deep Learning and even in Shallow Learning
36 / 63
Therefore
The maximum for the derivative of the sigmoid
f (0) = 0.25
Therefore, Given a Deep Convolutional Network
We could finish with
lim
k→∞
ds (x)
dx
k
= lim
k→∞
(0.25)k
→ 0
A Vanishing Derivative or Vanishing Gradient
Making quite difficult to do train a deeper network using this
activation function for Deep Learning and even in Shallow Learning
36 / 63
Therefore
The maximum for the derivative of the sigmoid
f (0) = 0.25
Therefore, Given a Deep Convolutional Network
We could finish with
lim
k→∞
ds (x)
dx
k
= lim
k→∞
(0.25)k
→ 0
A Vanishing Derivative or Vanishing Gradient
Making quite difficult to do train a deeper network using this
activation function for Deep Learning and even in Shallow Learning
36 / 63
Outline
1 What are Neural Networks?
Structure of a Neural Cell
Pigeon Experiment
Formal Definition of Artificial Neural Network
Basic Elements of an Artificial Neuron
Types of Activation Functions
McCulloch-Pitts model
More Advanced Models
The Problem of the Vanishing Gradient
Fixing the Problem, ReLu function
2 Neural Network As a Graph
Introduction
Feedback
Neural Architectures
Knowledge Representation
Design of a Neural Network
Representing Knowledge in a Neural Networks
37 / 63
Thus
The need to introduce a new function
f (x) = x+
= max (0, x)
It is called ReLu or Rectifier
With a smooth approximation (Softplus function)
f (x) =
ln 1 + ekx
k
38 / 63
Thus
The need to introduce a new function
f (x) = x+
= max (0, x)
It is called ReLu or Rectifier
With a smooth approximation (Softplus function)
f (x) =
ln 1 + ekx
k
38 / 63
We have
When k = 1
+0.4 +1.0 +1.6 +2.2−0.4−1.0−1.6−2.2−2.8
−0.5
+0.5
+1.0
+1.5
+2.0
+2.5
+3.0
+3.5
Softplus
ReLu
39 / 63
Increase k
When k = 104
+0.0006 +0.0012 +0.0018 +0.0024−0.0004−0.001−0.0016−0.0022
−0.001
+0.001
+0.002
+0.003
+0.004
Softplus
ReLu
40 / 63
Final Remarks
Although, ReLu functions
They can handle the problem of vanishing problem
However, as we will see, saturation starts to appear as a problem
As in Hebbian Learning!!!
41 / 63
Final Remarks
Although, ReLu functions
They can handle the problem of vanishing problem
However, as we will see, saturation starts to appear as a problem
As in Hebbian Learning!!!
41 / 63
Outline
1 What are Neural Networks?
Structure of a Neural Cell
Pigeon Experiment
Formal Definition of Artificial Neural Network
Basic Elements of an Artificial Neuron
Types of Activation Functions
McCulloch-Pitts model
More Advanced Models
The Problem of the Vanishing Gradient
Fixing the Problem, ReLu function
2 Neural Network As a Graph
Introduction
Feedback
Neural Architectures
Knowledge Representation
Design of a Neural Network
Representing Knowledge in a Neural Networks
42 / 63
Neural Network As a Graph
Definition
A neural network is a directed graph consisting of nodes with
interconnecting synaptic and activation links.
Properties
1 Each neuron is represented by a set of linear synaptic links, an
externally applied bias, and a possibly nonlinear activation link.
2 The synaptic to links of a neuron weight their respective input signals.
3 The weighted sum of the input signals defines the induced local field
of the neuron.
4 The activation link squashes the induced local field of the neuron to
produce an output.
43 / 63
Neural Network As a Graph
Definition
A neural network is a directed graph consisting of nodes with
interconnecting synaptic and activation links.
Properties
1 Each neuron is represented by a set of linear synaptic links, an
externally applied bias, and a possibly nonlinear activation link.
2 The synaptic to links of a neuron weight their respective input signals.
3 The weighted sum of the input signals defines the induced local field
of the neuron.
4 The activation link squashes the induced local field of the neuron to
produce an output.
43 / 63
Neural Network As a Graph
Definition
A neural network is a directed graph consisting of nodes with
interconnecting synaptic and activation links.
Properties
1 Each neuron is represented by a set of linear synaptic links, an
externally applied bias, and a possibly nonlinear activation link.
2 The synaptic to links of a neuron weight their respective input signals.
3 The weighted sum of the input signals defines the induced local field
of the neuron.
4 The activation link squashes the induced local field of the neuron to
produce an output.
43 / 63
Neural Network As a Graph
Definition
A neural network is a directed graph consisting of nodes with
interconnecting synaptic and activation links.
Properties
1 Each neuron is represented by a set of linear synaptic links, an
externally applied bias, and a possibly nonlinear activation link.
2 The synaptic to links of a neuron weight their respective input signals.
3 The weighted sum of the input signals defines the induced local field
of the neuron.
4 The activation link squashes the induced local field of the neuron to
produce an output.
43 / 63
Neural Network As a Graph
Definition
A neural network is a directed graph consisting of nodes with
interconnecting synaptic and activation links.
Properties
1 Each neuron is represented by a set of linear synaptic links, an
externally applied bias, and a possibly nonlinear activation link.
2 The synaptic to links of a neuron weight their respective input signals.
3 The weighted sum of the input signals defines the induced local field
of the neuron.
4 The activation link squashes the induced local field of the neuron to
produce an output.
43 / 63
Example
Example
44 / 63
Some Observations
Observation
A partially complete graph describing a neural architecture has the following
characteristics:
Source nodes supply input signals to the graph.
Each neuron is represented by a single node called a computation node.
The communication links provide directions of signal flow in the graph.
However
Other Representations exist!!!
Three main representations ones
Block diagram, providing a functional description of the network.
Signal-flow graph, providing a complete description of signal flow in the
network.
Then one we plan to use.
Architectural graph, describing the network layout.
45 / 63
Some Observations
Observation
A partially complete graph describing a neural architecture has the following
characteristics:
Source nodes supply input signals to the graph.
Each neuron is represented by a single node called a computation node.
The communication links provide directions of signal flow in the graph.
However
Other Representations exist!!!
Three main representations ones
Block diagram, providing a functional description of the network.
Signal-flow graph, providing a complete description of signal flow in the
network.
Then one we plan to use.
Architectural graph, describing the network layout.
45 / 63
Some Observations
Observation
A partially complete graph describing a neural architecture has the following
characteristics:
Source nodes supply input signals to the graph.
Each neuron is represented by a single node called a computation node.
The communication links provide directions of signal flow in the graph.
However
Other Representations exist!!!
Three main representations ones
Block diagram, providing a functional description of the network.
Signal-flow graph, providing a complete description of signal flow in the
network.
Then one we plan to use.
Architectural graph, describing the network layout.
45 / 63
Some Observations
Observation
A partially complete graph describing a neural architecture has the following
characteristics:
Source nodes supply input signals to the graph.
Each neuron is represented by a single node called a computation node.
The communication links provide directions of signal flow in the graph.
However
Other Representations exist!!!
Three main representations ones
Block diagram, providing a functional description of the network.
Signal-flow graph, providing a complete description of signal flow in the
network.
Then one we plan to use.
Architectural graph, describing the network layout.
45 / 63
Some Observations
Observation
A partially complete graph describing a neural architecture has the following
characteristics:
Source nodes supply input signals to the graph.
Each neuron is represented by a single node called a computation node.
The communication links provide directions of signal flow in the graph.
However
Other Representations exist!!!
Three main representations ones
Block diagram, providing a functional description of the network.
Signal-flow graph, providing a complete description of signal flow in the
network.
Then one we plan to use.
Architectural graph, describing the network layout.
45 / 63
Some Observations
Observation
A partially complete graph describing a neural architecture has the following
characteristics:
Source nodes supply input signals to the graph.
Each neuron is represented by a single node called a computation node.
The communication links provide directions of signal flow in the graph.
However
Other Representations exist!!!
Three main representations ones
Block diagram, providing a functional description of the network.
Signal-flow graph, providing a complete description of signal flow in the
network.
Then one we plan to use.
Architectural graph, describing the network layout.
45 / 63
Some Observations
Observation
A partially complete graph describing a neural architecture has the following
characteristics:
Source nodes supply input signals to the graph.
Each neuron is represented by a single node called a computation node.
The communication links provide directions of signal flow in the graph.
However
Other Representations exist!!!
Three main representations ones
Block diagram, providing a functional description of the network.
Signal-flow graph, providing a complete description of signal flow in the
network.
Then one we plan to use.
Architectural graph, describing the network layout.
45 / 63
Some Observations
Observation
A partially complete graph describing a neural architecture has the following
characteristics:
Source nodes supply input signals to the graph.
Each neuron is represented by a single node called a computation node.
The communication links provide directions of signal flow in the graph.
However
Other Representations exist!!!
Three main representations ones
Block diagram, providing a functional description of the network.
Signal-flow graph, providing a complete description of signal flow in the
network.
Then one we plan to use.
Architectural graph, describing the network layout.
45 / 63
Some Observations
Observation
A partially complete graph describing a neural architecture has the following
characteristics:
Source nodes supply input signals to the graph.
Each neuron is represented by a single node called a computation node.
The communication links provide directions of signal flow in the graph.
However
Other Representations exist!!!
Three main representations ones
Block diagram, providing a functional description of the network.
Signal-flow graph, providing a complete description of signal flow in the
network.
Then one we plan to use.
Architectural graph, describing the network layout.
45 / 63
Outline
1 What are Neural Networks?
Structure of a Neural Cell
Pigeon Experiment
Formal Definition of Artificial Neural Network
Basic Elements of an Artificial Neuron
Types of Activation Functions
McCulloch-Pitts model
More Advanced Models
The Problem of the Vanishing Gradient
Fixing the Problem, ReLu function
2 Neural Network As a Graph
Introduction
Feedback
Neural Architectures
Knowledge Representation
Design of a Neural Network
Representing Knowledge in a Neural Networks
46 / 63
Feedback Model
Feedback
Feedback is said to exist in a dynamic system.
Indeed, feedback occurs in almost every part of the nervous system of
every animal (Freeman, 1975)
Feedback Architecture
Thus
Given xj the internal signal and A (The weights product) operator:
Output yk (n) = A xj (n) (8)
47 / 63
Feedback Model
Feedback
Feedback is said to exist in a dynamic system.
Indeed, feedback occurs in almost every part of the nervous system of
every animal (Freeman, 1975)
Feedback Architecture
Thus
Given xj the internal signal and A (The weights product) operator:
Output yk (n) = A xj (n) (8)
47 / 63
Feedback Model
Feedback
Feedback is said to exist in a dynamic system.
Indeed, feedback occurs in almost every part of the nervous system of
every animal (Freeman, 1975)
Feedback Architecture
Thus
Given xj the internal signal and A (The weights product) operator:
Output yk (n) = A xj (n) (8)
47 / 63
Thus
The internal input has a feedback
Given a B operator:
xj (n) = xj (n) + B (yk (n)) (9)
Then eliminating xj (n)
yk (n) =
A
1 − AB
(xj (n)) (10)
Where A
1−AB is known as the closed-loop operator of the system.
48 / 63
Thus
The internal input has a feedback
Given a B operator:
xj (n) = xj (n) + B (yk (n)) (9)
Then eliminating xj (n)
yk (n) =
A
1 − AB
(xj (n)) (10)
Where A
1−AB is known as the closed-loop operator of the system.
48 / 63
Outline
1 What are Neural Networks?
Structure of a Neural Cell
Pigeon Experiment
Formal Definition of Artificial Neural Network
Basic Elements of an Artificial Neuron
Types of Activation Functions
McCulloch-Pitts model
More Advanced Models
The Problem of the Vanishing Gradient
Fixing the Problem, ReLu function
2 Neural Network As a Graph
Introduction
Feedback
Neural Architectures
Knowledge Representation
Design of a Neural Network
Representing Knowledge in a Neural Networks
49 / 63
Neural Architectures I
Single-Layer Feedforward Networks
Input Layer
of Source
Nodes
Output Layer
of Neurons
50 / 63
Observations
Observations
This network is know as a strictly feed-forward or acyclic type.
51 / 63
Neural Architectures II
Multilayer Feedforward Networks
Input Layer
of Source
Nodes
Output Layer
of Neurons
Hidden Layer
of Neurons
52 / 63
Observations
Observations
1 This network contains a series of hidden layer.
2 Each hidden layers allows for classification of the new output space of
the previous hidden layer.
53 / 63
Observations
Observations
1 This network contains a series of hidden layer.
2 Each hidden layers allows for classification of the new output space of
the previous hidden layer.
53 / 63
Neural Architectures III
Recurrent Networks
UNIT-DELAY
OPERATORS
54 / 63
Observations
Observations
1 This network has not self-feedback loops.
2 It has something known as unit delay operator B = z−1.
55 / 63
Observations
Observations
1 This network has not self-feedback loops.
2 It has something known as unit delay operator B = z−1.
55 / 63
Outline
1 What are Neural Networks?
Structure of a Neural Cell
Pigeon Experiment
Formal Definition of Artificial Neural Network
Basic Elements of an Artificial Neuron
Types of Activation Functions
McCulloch-Pitts model
More Advanced Models
The Problem of the Vanishing Gradient
Fixing the Problem, ReLu function
2 Neural Network As a Graph
Introduction
Feedback
Neural Architectures
Knowledge Representation
Design of a Neural Network
Representing Knowledge in a Neural Networks
56 / 63
Knowledge Representation
Definition
By Fischler and Firschein, 1987
“Knowledge refers to stored information or models used by a person or
machine to interpret, predict, and appropriately respond to the outside
world. ”
It consists of two kinds of information:
1 The known world state, represented by facts about what is and what
has been known.
2 Observations (measurements) of the world, obtained by means of
sensors.
Observations can be
1 Labeled
2 Unlabeled
57 / 63
Knowledge Representation
Definition
By Fischler and Firschein, 1987
“Knowledge refers to stored information or models used by a person or
machine to interpret, predict, and appropriately respond to the outside
world. ”
It consists of two kinds of information:
1 The known world state, represented by facts about what is and what
has been known.
2 Observations (measurements) of the world, obtained by means of
sensors.
Observations can be
1 Labeled
2 Unlabeled
57 / 63
Knowledge Representation
Definition
By Fischler and Firschein, 1987
“Knowledge refers to stored information or models used by a person or
machine to interpret, predict, and appropriately respond to the outside
world. ”
It consists of two kinds of information:
1 The known world state, represented by facts about what is and what
has been known.
2 Observations (measurements) of the world, obtained by means of
sensors.
Observations can be
1 Labeled
2 Unlabeled
57 / 63
Knowledge Representation
Definition
By Fischler and Firschein, 1987
“Knowledge refers to stored information or models used by a person or
machine to interpret, predict, and appropriately respond to the outside
world. ”
It consists of two kinds of information:
1 The known world state, represented by facts about what is and what
has been known.
2 Observations (measurements) of the world, obtained by means of
sensors.
Observations can be
1 Labeled
2 Unlabeled
57 / 63
Knowledge Representation
Definition
By Fischler and Firschein, 1987
“Knowledge refers to stored information or models used by a person or
machine to interpret, predict, and appropriately respond to the outside
world. ”
It consists of two kinds of information:
1 The known world state, represented by facts about what is and what
has been known.
2 Observations (measurements) of the world, obtained by means of
sensors.
Observations can be
1 Labeled
2 Unlabeled
57 / 63
Knowledge Representation
Definition
By Fischler and Firschein, 1987
“Knowledge refers to stored information or models used by a person or
machine to interpret, predict, and appropriately respond to the outside
world. ”
It consists of two kinds of information:
1 The known world state, represented by facts about what is and what
has been known.
2 Observations (measurements) of the world, obtained by means of
sensors.
Observations can be
1 Labeled
2 Unlabeled
57 / 63
Outline
1 What are Neural Networks?
Structure of a Neural Cell
Pigeon Experiment
Formal Definition of Artificial Neural Network
Basic Elements of an Artificial Neuron
Types of Activation Functions
McCulloch-Pitts model
More Advanced Models
The Problem of the Vanishing Gradient
Fixing the Problem, ReLu function
2 Neural Network As a Graph
Introduction
Feedback
Neural Architectures
Knowledge Representation
Design of a Neural Network
Representing Knowledge in a Neural Networks
58 / 63
Set of Training Data
Training Data
It consist of input-output pairs (x, y)
x= input signal
y= desired output
Thus, we have the following phases of designing a Neuronal Network
1 Choose appropriate architecture
2 Train the network - learning.
1 Use the Training Data!!!
3 Test the network with data not seen before
1 Use a set of pairs that where not shown to the network so the y
component is guessed.
4 Then, you can see how well the network behaves - Generalization Phase.
59 / 63
Set of Training Data
Training Data
It consist of input-output pairs (x, y)
x= input signal
y= desired output
Thus, we have the following phases of designing a Neuronal Network
1 Choose appropriate architecture
2 Train the network - learning.
1 Use the Training Data!!!
3 Test the network with data not seen before
1 Use a set of pairs that where not shown to the network so the y
component is guessed.
4 Then, you can see how well the network behaves - Generalization Phase.
59 / 63
Set of Training Data
Training Data
It consist of input-output pairs (x, y)
x= input signal
y= desired output
Thus, we have the following phases of designing a Neuronal Network
1 Choose appropriate architecture
2 Train the network - learning.
1 Use the Training Data!!!
3 Test the network with data not seen before
1 Use a set of pairs that where not shown to the network so the y
component is guessed.
4 Then, you can see how well the network behaves - Generalization Phase.
59 / 63
Set of Training Data
Training Data
It consist of input-output pairs (x, y)
x= input signal
y= desired output
Thus, we have the following phases of designing a Neuronal Network
1 Choose appropriate architecture
2 Train the network - learning.
1 Use the Training Data!!!
3 Test the network with data not seen before
1 Use a set of pairs that where not shown to the network so the y
component is guessed.
4 Then, you can see how well the network behaves - Generalization Phase.
59 / 63
Set of Training Data
Training Data
It consist of input-output pairs (x, y)
x= input signal
y= desired output
Thus, we have the following phases of designing a Neuronal Network
1 Choose appropriate architecture
2 Train the network - learning.
1 Use the Training Data!!!
3 Test the network with data not seen before
1 Use a set of pairs that where not shown to the network so the y
component is guessed.
4 Then, you can see how well the network behaves - Generalization Phase.
59 / 63
Set of Training Data
Training Data
It consist of input-output pairs (x, y)
x= input signal
y= desired output
Thus, we have the following phases of designing a Neuronal Network
1 Choose appropriate architecture
2 Train the network - learning.
1 Use the Training Data!!!
3 Test the network with data not seen before
1 Use a set of pairs that where not shown to the network so the y
component is guessed.
4 Then, you can see how well the network behaves - Generalization Phase.
59 / 63
Set of Training Data
Training Data
It consist of input-output pairs (x, y)
x= input signal
y= desired output
Thus, we have the following phases of designing a Neuronal Network
1 Choose appropriate architecture
2 Train the network - learning.
1 Use the Training Data!!!
3 Test the network with data not seen before
1 Use a set of pairs that where not shown to the network so the y
component is guessed.
4 Then, you can see how well the network behaves - Generalization Phase.
59 / 63
Outline
1 What are Neural Networks?
Structure of a Neural Cell
Pigeon Experiment
Formal Definition of Artificial Neural Network
Basic Elements of an Artificial Neuron
Types of Activation Functions
McCulloch-Pitts model
More Advanced Models
The Problem of the Vanishing Gradient
Fixing the Problem, ReLu function
2 Neural Network As a Graph
Introduction
Feedback
Neural Architectures
Knowledge Representation
Design of a Neural Network
Representing Knowledge in a Neural Networks
60 / 63
Representing Knowledge in a Neural Networks
Notice the Following
The subject of knowledge representation inside an artificial network is very
complicated.
However: Pattern Classifiers Vs Neural Networks
1 Pattern Classifiers are first designed and then validated by the
environment.
2 Neural Networks learns the environment by using the data from it!!!
Kurt Hornik et al. proved (1989)
“Standard multilayer feedforward networks with as few as one hidden layer
using arbitrary squashing functions are capable of approximating any Borel
measurable function” (Basically many of the known ones!!!)
61 / 63
Representing Knowledge in a Neural Networks
Notice the Following
The subject of knowledge representation inside an artificial network is very
complicated.
However: Pattern Classifiers Vs Neural Networks
1 Pattern Classifiers are first designed and then validated by the
environment.
2 Neural Networks learns the environment by using the data from it!!!
Kurt Hornik et al. proved (1989)
“Standard multilayer feedforward networks with as few as one hidden layer
using arbitrary squashing functions are capable of approximating any Borel
measurable function” (Basically many of the known ones!!!)
61 / 63
Representing Knowledge in a Neural Networks
Notice the Following
The subject of knowledge representation inside an artificial network is very
complicated.
However: Pattern Classifiers Vs Neural Networks
1 Pattern Classifiers are first designed and then validated by the
environment.
2 Neural Networks learns the environment by using the data from it!!!
Kurt Hornik et al. proved (1989)
“Standard multilayer feedforward networks with as few as one hidden layer
using arbitrary squashing functions are capable of approximating any Borel
measurable function” (Basically many of the known ones!!!)
61 / 63
Rules Knowledge Representation
Rule 1
Similar inputs from similar classes should usually produce similar
representation.
We can use a Metric to measure that similarity!!!
Examples
1 d (xi, xj) = xi − xj (Classic Euclidean Metric).
2 d2
ij = (xi − µi)T −1
xj − µj (Mahalanobis distance) where
1 µi = E [xi] .
2 = E (xi − µi) (xi − µi)
T
= E xj − µj xj − µj
T
.
62 / 63
Rules Knowledge Representation
Rule 1
Similar inputs from similar classes should usually produce similar
representation.
We can use a Metric to measure that similarity!!!
Examples
1 d (xi, xj) = xi − xj (Classic Euclidean Metric).
2 d2
ij = (xi − µi)T −1
xj − µj (Mahalanobis distance) where
1 µi = E [xi] .
2 = E (xi − µi) (xi − µi)
T
= E xj − µj xj − µj
T
.
62 / 63
Rules Knowledge Representation
Rule 1
Similar inputs from similar classes should usually produce similar
representation.
We can use a Metric to measure that similarity!!!
Examples
1 d (xi, xj) = xi − xj (Classic Euclidean Metric).
2 d2
ij = (xi − µi)T −1
xj − µj (Mahalanobis distance) where
1 µi = E [xi] .
2 = E (xi − µi) (xi − µi)
T
= E xj − µj xj − µj
T
.
62 / 63
More
Rule 2
Items to be categorized as separate classes should be given widely different
representations in the network.
Rule 3
If a particular feature is important, then there should be a large number of
neurons involved in the representation.
Rule 4
Prior information and invariance should be built into the design:
Thus, simplify the network by not learning that data.
63 / 63
More
Rule 2
Items to be categorized as separate classes should be given widely different
representations in the network.
Rule 3
If a particular feature is important, then there should be a large number of
neurons involved in the representation.
Rule 4
Prior information and invariance should be built into the design:
Thus, simplify the network by not learning that data.
63 / 63
More
Rule 2
Items to be categorized as separate classes should be given widely different
representations in the network.
Rule 3
If a particular feature is important, then there should be a large number of
neurons involved in the representation.
Rule 4
Prior information and invariance should be built into the design:
Thus, simplify the network by not learning that data.
63 / 63

More Related Content

Similar to 01 Introduction to Neural Networks and Deep Learning

Blue Brain
Blue BrainBlue Brain
Blue Brain
Kusum Agroiya
 
Blue Brain_Nikhilesh+Krishna Raj
Blue Brain_Nikhilesh+Krishna RajBlue Brain_Nikhilesh+Krishna Raj
Blue Brain_Nikhilesh+Krishna Raj
Krishna Raj .S
 
Axon to axon
Axon to axonAxon to axon
Axon to axon
Curtis Cripe
 
Presentation on Blue Brain project .pptx
Presentation on Blue Brain project .pptxPresentation on Blue Brain project .pptx
Presentation on Blue Brain project .pptx
velezjace9
 
Ann first lecture
Ann first lectureAnn first lecture
Blue Brain Technology
Blue Brain TechnologyBlue Brain Technology
Blue Brain Technology
hanmaslah
 
Technology Runs to Get Brains’ Control
Technology Runs to Get Brains’ ControlTechnology Runs to Get Brains’ Control
Technology Runs to Get Brains’ Control
Md. Ashifur Rahaman
 
Miracles of Brain
Miracles of BrainMiracles of Brain
Cracking the neural code
Cracking the neural codeCracking the neural code
Cracking the neural code
AlbanLevy
 
CVPR 2020 Workshop: Sparsity in the neocortex, and its implications for conti...
CVPR 2020 Workshop: Sparsity in the neocortex, and its implications for conti...CVPR 2020 Workshop: Sparsity in the neocortex, and its implications for conti...
CVPR 2020 Workshop: Sparsity in the neocortex, and its implications for conti...
Christy Maver
 
CVPR 2020 Workshop: Sparsity in the neocortex, and its implications for conti...
CVPR 2020 Workshop: Sparsity in the neocortex, and its implications for conti...CVPR 2020 Workshop: Sparsity in the neocortex, and its implications for conti...
CVPR 2020 Workshop: Sparsity in the neocortex, and its implications for conti...
Numenta
 
Neural Netwrok
Neural NetwrokNeural Netwrok
Neural Netwrok
Rabin BK
 
Brain chips ppt
Brain chips pptBrain chips ppt
Brain chips ppt
9440999171
 
Art of Neuroscience 2017 - Team Favorites
Art of Neuroscience 2017 - Team FavoritesArt of Neuroscience 2017 - Team Favorites
Art of Neuroscience 2017 - Team Favorites
Art of Neuroscience
 
Art of Neuroscience 2017 - Selected Submissions
Art of Neuroscience 2017 - Selected SubmissionsArt of Neuroscience 2017 - Selected Submissions
Art of Neuroscience 2017 - Selected Submissions
Art of Neuroscience
 
Blue_Brain_ppt_By_ K. Sai Krishna (Thalaivar)
Blue_Brain_ppt_By_ K. Sai Krishna (Thalaivar)Blue_Brain_ppt_By_ K. Sai Krishna (Thalaivar)
Blue_Brain_ppt_By_ K. Sai Krishna (Thalaivar)
AMIRDHAADALARASUN
 
Introduction to Neural networks (under graduate course) Lecture 1 of 9
Introduction to Neural networks (under graduate course) Lecture 1 of 9Introduction to Neural networks (under graduate course) Lecture 1 of 9
Introduction to Neural networks (under graduate course) Lecture 1 of 9
Randa Elanwar
 
1.ppt
1.ppt1.ppt
Blue Brain Technology, Artificial Intelligence
Blue Brain Technology, Artificial IntelligenceBlue Brain Technology, Artificial Intelligence
Blue Brain Technology, Artificial Intelligence
jjoyjessy31
 

Similar to 01 Introduction to Neural Networks and Deep Learning (20)

blue brain
blue brainblue brain
blue brain
 
Blue Brain
Blue BrainBlue Brain
Blue Brain
 
Blue Brain_Nikhilesh+Krishna Raj
Blue Brain_Nikhilesh+Krishna RajBlue Brain_Nikhilesh+Krishna Raj
Blue Brain_Nikhilesh+Krishna Raj
 
Axon to axon
Axon to axonAxon to axon
Axon to axon
 
Presentation on Blue Brain project .pptx
Presentation on Blue Brain project .pptxPresentation on Blue Brain project .pptx
Presentation on Blue Brain project .pptx
 
Ann first lecture
Ann first lectureAnn first lecture
Ann first lecture
 
Blue Brain Technology
Blue Brain TechnologyBlue Brain Technology
Blue Brain Technology
 
Technology Runs to Get Brains’ Control
Technology Runs to Get Brains’ ControlTechnology Runs to Get Brains’ Control
Technology Runs to Get Brains’ Control
 
Miracles of Brain
Miracles of BrainMiracles of Brain
Miracles of Brain
 
Cracking the neural code
Cracking the neural codeCracking the neural code
Cracking the neural code
 
CVPR 2020 Workshop: Sparsity in the neocortex, and its implications for conti...
CVPR 2020 Workshop: Sparsity in the neocortex, and its implications for conti...CVPR 2020 Workshop: Sparsity in the neocortex, and its implications for conti...
CVPR 2020 Workshop: Sparsity in the neocortex, and its implications for conti...
 
CVPR 2020 Workshop: Sparsity in the neocortex, and its implications for conti...
CVPR 2020 Workshop: Sparsity in the neocortex, and its implications for conti...CVPR 2020 Workshop: Sparsity in the neocortex, and its implications for conti...
CVPR 2020 Workshop: Sparsity in the neocortex, and its implications for conti...
 
Neural Netwrok
Neural NetwrokNeural Netwrok
Neural Netwrok
 
Brain chips ppt
Brain chips pptBrain chips ppt
Brain chips ppt
 
Art of Neuroscience 2017 - Team Favorites
Art of Neuroscience 2017 - Team FavoritesArt of Neuroscience 2017 - Team Favorites
Art of Neuroscience 2017 - Team Favorites
 
Art of Neuroscience 2017 - Selected Submissions
Art of Neuroscience 2017 - Selected SubmissionsArt of Neuroscience 2017 - Selected Submissions
Art of Neuroscience 2017 - Selected Submissions
 
Blue_Brain_ppt_By_ K. Sai Krishna (Thalaivar)
Blue_Brain_ppt_By_ K. Sai Krishna (Thalaivar)Blue_Brain_ppt_By_ K. Sai Krishna (Thalaivar)
Blue_Brain_ppt_By_ K. Sai Krishna (Thalaivar)
 
Introduction to Neural networks (under graduate course) Lecture 1 of 9
Introduction to Neural networks (under graduate course) Lecture 1 of 9Introduction to Neural networks (under graduate course) Lecture 1 of 9
Introduction to Neural networks (under graduate course) Lecture 1 of 9
 
1.ppt
1.ppt1.ppt
1.ppt
 
Blue Brain Technology, Artificial Intelligence
Blue Brain Technology, Artificial IntelligenceBlue Brain Technology, Artificial Intelligence
Blue Brain Technology, Artificial Intelligence
 

More from Andres Mendez-Vazquez

2.03 bayesian estimation
2.03 bayesian estimation2.03 bayesian estimation
2.03 bayesian estimation
Andres Mendez-Vazquez
 
05 linear transformations
05 linear transformations05 linear transformations
05 linear transformations
Andres Mendez-Vazquez
 
01.04 orthonormal basis_eigen_vectors
01.04 orthonormal basis_eigen_vectors01.04 orthonormal basis_eigen_vectors
01.04 orthonormal basis_eigen_vectors
Andres Mendez-Vazquez
 
01.03 squared matrices_and_other_issues
01.03 squared matrices_and_other_issues01.03 squared matrices_and_other_issues
01.03 squared matrices_and_other_issues
Andres Mendez-Vazquez
 
01.02 linear equations
01.02 linear equations01.02 linear equations
01.02 linear equations
Andres Mendez-Vazquez
 
01.01 vector spaces
01.01 vector spaces01.01 vector spaces
01.01 vector spaces
Andres Mendez-Vazquez
 
06 recurrent neural_networks
06 recurrent neural_networks06 recurrent neural_networks
06 recurrent neural_networks
Andres Mendez-Vazquez
 
05 backpropagation automatic_differentiation
05 backpropagation automatic_differentiation05 backpropagation automatic_differentiation
05 backpropagation automatic_differentiation
Andres Mendez-Vazquez
 
Zetta global
Zetta globalZetta global
Zetta global
Andres Mendez-Vazquez
 
25 introduction reinforcement_learning
25 introduction reinforcement_learning25 introduction reinforcement_learning
25 introduction reinforcement_learning
Andres Mendez-Vazquez
 
Neural Networks and Deep Learning Syllabus
Neural Networks and Deep Learning SyllabusNeural Networks and Deep Learning Syllabus
Neural Networks and Deep Learning Syllabus
Andres Mendez-Vazquez
 
Introduction to artificial_intelligence_syllabus
Introduction to artificial_intelligence_syllabusIntroduction to artificial_intelligence_syllabus
Introduction to artificial_intelligence_syllabus
Andres Mendez-Vazquez
 
Ideas 09 22_2018
Ideas 09 22_2018Ideas 09 22_2018
Ideas 09 22_2018
Andres Mendez-Vazquez
 
Ideas about a Bachelor in Machine Learning/Data Sciences
Ideas about a Bachelor in Machine Learning/Data SciencesIdeas about a Bachelor in Machine Learning/Data Sciences
Ideas about a Bachelor in Machine Learning/Data Sciences
Andres Mendez-Vazquez
 
Analysis of Algorithms Syllabus
Analysis of Algorithms  SyllabusAnalysis of Algorithms  Syllabus
Analysis of Algorithms Syllabus
Andres Mendez-Vazquez
 
20 k-means, k-center, k-meoids and variations
20 k-means, k-center, k-meoids and variations20 k-means, k-center, k-meoids and variations
20 k-means, k-center, k-meoids and variations
Andres Mendez-Vazquez
 
18.1 combining models
18.1 combining models18.1 combining models
18.1 combining models
Andres Mendez-Vazquez
 
17 vapnik chervonenkis dimension
17 vapnik chervonenkis dimension17 vapnik chervonenkis dimension
17 vapnik chervonenkis dimension
Andres Mendez-Vazquez
 
A basic introduction to learning
A basic introduction to learningA basic introduction to learning
A basic introduction to learning
Andres Mendez-Vazquez
 
Introduction Mathematics Intelligent Systems Syllabus
Introduction Mathematics Intelligent Systems SyllabusIntroduction Mathematics Intelligent Systems Syllabus
Introduction Mathematics Intelligent Systems Syllabus
Andres Mendez-Vazquez
 

More from Andres Mendez-Vazquez (20)

2.03 bayesian estimation
2.03 bayesian estimation2.03 bayesian estimation
2.03 bayesian estimation
 
05 linear transformations
05 linear transformations05 linear transformations
05 linear transformations
 
01.04 orthonormal basis_eigen_vectors
01.04 orthonormal basis_eigen_vectors01.04 orthonormal basis_eigen_vectors
01.04 orthonormal basis_eigen_vectors
 
01.03 squared matrices_and_other_issues
01.03 squared matrices_and_other_issues01.03 squared matrices_and_other_issues
01.03 squared matrices_and_other_issues
 
01.02 linear equations
01.02 linear equations01.02 linear equations
01.02 linear equations
 
01.01 vector spaces
01.01 vector spaces01.01 vector spaces
01.01 vector spaces
 
06 recurrent neural_networks
06 recurrent neural_networks06 recurrent neural_networks
06 recurrent neural_networks
 
05 backpropagation automatic_differentiation
05 backpropagation automatic_differentiation05 backpropagation automatic_differentiation
05 backpropagation automatic_differentiation
 
Zetta global
Zetta globalZetta global
Zetta global
 
25 introduction reinforcement_learning
25 introduction reinforcement_learning25 introduction reinforcement_learning
25 introduction reinforcement_learning
 
Neural Networks and Deep Learning Syllabus
Neural Networks and Deep Learning SyllabusNeural Networks and Deep Learning Syllabus
Neural Networks and Deep Learning Syllabus
 
Introduction to artificial_intelligence_syllabus
Introduction to artificial_intelligence_syllabusIntroduction to artificial_intelligence_syllabus
Introduction to artificial_intelligence_syllabus
 
Ideas 09 22_2018
Ideas 09 22_2018Ideas 09 22_2018
Ideas 09 22_2018
 
Ideas about a Bachelor in Machine Learning/Data Sciences
Ideas about a Bachelor in Machine Learning/Data SciencesIdeas about a Bachelor in Machine Learning/Data Sciences
Ideas about a Bachelor in Machine Learning/Data Sciences
 
Analysis of Algorithms Syllabus
Analysis of Algorithms  SyllabusAnalysis of Algorithms  Syllabus
Analysis of Algorithms Syllabus
 
20 k-means, k-center, k-meoids and variations
20 k-means, k-center, k-meoids and variations20 k-means, k-center, k-meoids and variations
20 k-means, k-center, k-meoids and variations
 
18.1 combining models
18.1 combining models18.1 combining models
18.1 combining models
 
17 vapnik chervonenkis dimension
17 vapnik chervonenkis dimension17 vapnik chervonenkis dimension
17 vapnik chervonenkis dimension
 
A basic introduction to learning
A basic introduction to learningA basic introduction to learning
A basic introduction to learning
 
Introduction Mathematics Intelligent Systems Syllabus
Introduction Mathematics Intelligent Systems SyllabusIntroduction Mathematics Intelligent Systems Syllabus
Introduction Mathematics Intelligent Systems Syllabus
 

Recently uploaded

Fundamentals of Electric Drives and its applications.pptx
Fundamentals of Electric Drives and its applications.pptxFundamentals of Electric Drives and its applications.pptx
Fundamentals of Electric Drives and its applications.pptx
manasideore6
 
HYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generationHYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generation
Robbie Edward Sayers
 
6th International Conference on Machine Learning & Applications (CMLA 2024)
6th International Conference on Machine Learning & Applications (CMLA 2024)6th International Conference on Machine Learning & Applications (CMLA 2024)
6th International Conference on Machine Learning & Applications (CMLA 2024)
ClaraZara1
 
14 Template Contractual Notice - EOT Application
14 Template Contractual Notice - EOT Application14 Template Contractual Notice - EOT Application
14 Template Contractual Notice - EOT Application
SyedAbiiAzazi1
 
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdf
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdfGoverning Equations for Fundamental Aerodynamics_Anderson2010.pdf
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdf
WENKENLI1
 
Building Electrical System Design & Installation
Building Electrical System Design & InstallationBuilding Electrical System Design & Installation
Building Electrical System Design & Installation
symbo111
 
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Dr.Costas Sachpazis
 
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
AJAYKUMARPUND1
 
ML for identifying fraud using open blockchain data.pptx
ML for identifying fraud using open blockchain data.pptxML for identifying fraud using open blockchain data.pptx
ML for identifying fraud using open blockchain data.pptx
Vijay Dialani, PhD
 
road safety engineering r s e unit 3.pdf
road safety engineering  r s e unit 3.pdfroad safety engineering  r s e unit 3.pdf
road safety engineering r s e unit 3.pdf
VENKATESHvenky89705
 
Standard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - NeometrixStandard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - Neometrix
Neometrix_Engineering_Pvt_Ltd
 
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
thanhdowork
 
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
Amil Baba Dawood bangali
 
Heap Sort (SS).ppt FOR ENGINEERING GRADUATES, BCA, MCA, MTECH, BSC STUDENTS
Heap Sort (SS).ppt FOR ENGINEERING GRADUATES, BCA, MCA, MTECH, BSC STUDENTSHeap Sort (SS).ppt FOR ENGINEERING GRADUATES, BCA, MCA, MTECH, BSC STUDENTS
Heap Sort (SS).ppt FOR ENGINEERING GRADUATES, BCA, MCA, MTECH, BSC STUDENTS
Soumen Santra
 
Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024
Massimo Talia
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
zwunae
 
Cosmetic shop management system project report.pdf
Cosmetic shop management system project report.pdfCosmetic shop management system project report.pdf
Cosmetic shop management system project report.pdf
Kamal Acharya
 
Tutorial for 16S rRNA Gene Analysis with QIIME2.pdf
Tutorial for 16S rRNA Gene Analysis with QIIME2.pdfTutorial for 16S rRNA Gene Analysis with QIIME2.pdf
Tutorial for 16S rRNA Gene Analysis with QIIME2.pdf
aqil azizi
 
AP LAB PPT.pdf ap lab ppt no title specific
AP LAB PPT.pdf ap lab ppt no title specificAP LAB PPT.pdf ap lab ppt no title specific
AP LAB PPT.pdf ap lab ppt no title specific
BrazilAccount1
 
Hierarchical Digital Twin of a Naval Power System
Hierarchical Digital Twin of a Naval Power SystemHierarchical Digital Twin of a Naval Power System
Hierarchical Digital Twin of a Naval Power System
Kerry Sado
 

Recently uploaded (20)

Fundamentals of Electric Drives and its applications.pptx
Fundamentals of Electric Drives and its applications.pptxFundamentals of Electric Drives and its applications.pptx
Fundamentals of Electric Drives and its applications.pptx
 
HYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generationHYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generation
 
6th International Conference on Machine Learning & Applications (CMLA 2024)
6th International Conference on Machine Learning & Applications (CMLA 2024)6th International Conference on Machine Learning & Applications (CMLA 2024)
6th International Conference on Machine Learning & Applications (CMLA 2024)
 
14 Template Contractual Notice - EOT Application
14 Template Contractual Notice - EOT Application14 Template Contractual Notice - EOT Application
14 Template Contractual Notice - EOT Application
 
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdf
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdfGoverning Equations for Fundamental Aerodynamics_Anderson2010.pdf
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdf
 
Building Electrical System Design & Installation
Building Electrical System Design & InstallationBuilding Electrical System Design & Installation
Building Electrical System Design & Installation
 
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
 
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
 
ML for identifying fraud using open blockchain data.pptx
ML for identifying fraud using open blockchain data.pptxML for identifying fraud using open blockchain data.pptx
ML for identifying fraud using open blockchain data.pptx
 
road safety engineering r s e unit 3.pdf
road safety engineering  r s e unit 3.pdfroad safety engineering  r s e unit 3.pdf
road safety engineering r s e unit 3.pdf
 
Standard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - NeometrixStandard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - Neometrix
 
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
 
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
 
Heap Sort (SS).ppt FOR ENGINEERING GRADUATES, BCA, MCA, MTECH, BSC STUDENTS
Heap Sort (SS).ppt FOR ENGINEERING GRADUATES, BCA, MCA, MTECH, BSC STUDENTSHeap Sort (SS).ppt FOR ENGINEERING GRADUATES, BCA, MCA, MTECH, BSC STUDENTS
Heap Sort (SS).ppt FOR ENGINEERING GRADUATES, BCA, MCA, MTECH, BSC STUDENTS
 
Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
 
Cosmetic shop management system project report.pdf
Cosmetic shop management system project report.pdfCosmetic shop management system project report.pdf
Cosmetic shop management system project report.pdf
 
Tutorial for 16S rRNA Gene Analysis with QIIME2.pdf
Tutorial for 16S rRNA Gene Analysis with QIIME2.pdfTutorial for 16S rRNA Gene Analysis with QIIME2.pdf
Tutorial for 16S rRNA Gene Analysis with QIIME2.pdf
 
AP LAB PPT.pdf ap lab ppt no title specific
AP LAB PPT.pdf ap lab ppt no title specificAP LAB PPT.pdf ap lab ppt no title specific
AP LAB PPT.pdf ap lab ppt no title specific
 
Hierarchical Digital Twin of a Naval Power System
Hierarchical Digital Twin of a Naval Power SystemHierarchical Digital Twin of a Naval Power System
Hierarchical Digital Twin of a Naval Power System
 

01 Introduction to Neural Networks and Deep Learning

  • 1. Introduction to Neural Networks and Deep Learning Introduction to Neural Networks Andres Mendez-Vazquez May 5, 2019 1 / 63
  • 2. Outline 1 What are Neural Networks? Structure of a Neural Cell Pigeon Experiment Formal Definition of Artificial Neural Network Basic Elements of an Artificial Neuron Types of Activation Functions McCulloch-Pitts model More Advanced Models The Problem of the Vanishing Gradient Fixing the Problem, ReLu function 2 Neural Network As a Graph Introduction Feedback Neural Architectures Knowledge Representation Design of a Neural Network Representing Knowledge in a Neural Networks 2 / 63
  • 3. What are Neural Networks? Basic Intuition The human brain is a highly complex, nonlinear and parallel computer It is organized as a Network with (Ramon y Cajal 1911) 1 Basic Processing Units ≈ Neurons 2 Connections ≈ Axons and Dendrites 3 / 63
  • 4. What are Neural Networks? Basic Intuition The human brain is a highly complex, nonlinear and parallel computer It is organized as a Network with (Ramon y Cajal 1911) 1 Basic Processing Units ≈ Neurons 2 Connections ≈ Axons and Dendrites 3 / 63
  • 5. What are Neural Networks? Basic Intuition The human brain is a highly complex, nonlinear and parallel computer It is organized as a Network with (Ramon y Cajal 1911) 1 Basic Processing Units ≈ Neurons 2 Connections ≈ Axons and Dendrites 3 / 63
  • 6. Outline 1 What are Neural Networks? Structure of a Neural Cell Pigeon Experiment Formal Definition of Artificial Neural Network Basic Elements of an Artificial Neuron Types of Activation Functions McCulloch-Pitts model More Advanced Models The Problem of the Vanishing Gradient Fixing the Problem, ReLu function 2 Neural Network As a Graph Introduction Feedback Neural Architectures Knowledge Representation Design of a Neural Network Representing Knowledge in a Neural Networks 4 / 63
  • 7. Example Example Apical Dentrites Basal Dentrites CELL BODY Dendritic Spines   SYNAPTIC INPUTS AXON 5 / 63
  • 8. Silicon Chip Vs Neurons Speed Differential 1 Speeds in silicon chips are in the nanosecond range (10−9 s). 2 Speeds in human neural networks are in the millisecond range (10−3 s). However We have massive parallelism on the human brain 1 10 billion neurons in the human cortex. 2 60 trillion synapses or connections High Energy Efficiency 1 Human Brain uses 10−16 joules per operation. 2 Best computers use 10−6 joules per operation. 6 / 63
  • 9. Silicon Chip Vs Neurons Speed Differential 1 Speeds in silicon chips are in the nanosecond range (10−9 s). 2 Speeds in human neural networks are in the millisecond range (10−3 s). However We have massive parallelism on the human brain 1 10 billion neurons in the human cortex. 2 60 trillion synapses or connections High Energy Efficiency 1 Human Brain uses 10−16 joules per operation. 2 Best computers use 10−6 joules per operation. 6 / 63
  • 10. Silicon Chip Vs Neurons Speed Differential 1 Speeds in silicon chips are in the nanosecond range (10−9 s). 2 Speeds in human neural networks are in the millisecond range (10−3 s). However We have massive parallelism on the human brain 1 10 billion neurons in the human cortex. 2 60 trillion synapses or connections High Energy Efficiency 1 Human Brain uses 10−16 joules per operation. 2 Best computers use 10−6 joules per operation. 6 / 63
  • 11. Silicon Chip Vs Neurons Speed Differential 1 Speeds in silicon chips are in the nanosecond range (10−9 s). 2 Speeds in human neural networks are in the millisecond range (10−3 s). However We have massive parallelism on the human brain 1 10 billion neurons in the human cortex. 2 60 trillion synapses or connections High Energy Efficiency 1 Human Brain uses 10−16 joules per operation. 2 Best computers use 10−6 joules per operation. 6 / 63
  • 12. Silicon Chip Vs Neurons Speed Differential 1 Speeds in silicon chips are in the nanosecond range (10−9 s). 2 Speeds in human neural networks are in the millisecond range (10−3 s). However We have massive parallelism on the human brain 1 10 billion neurons in the human cortex. 2 60 trillion synapses or connections High Energy Efficiency 1 Human Brain uses 10−16 joules per operation. 2 Best computers use 10−6 joules per operation. 6 / 63
  • 13. Silicon Chip Vs Neurons Speed Differential 1 Speeds in silicon chips are in the nanosecond range (10−9 s). 2 Speeds in human neural networks are in the millisecond range (10−3 s). However We have massive parallelism on the human brain 1 10 billion neurons in the human cortex. 2 60 trillion synapses or connections High Energy Efficiency 1 Human Brain uses 10−16 joules per operation. 2 Best computers use 10−6 joules per operation. 6 / 63
  • 14. Silicon Chip Vs Neurons Speed Differential 1 Speeds in silicon chips are in the nanosecond range (10−9 s). 2 Speeds in human neural networks are in the millisecond range (10−3 s). However We have massive parallelism on the human brain 1 10 billion neurons in the human cortex. 2 60 trillion synapses or connections High Energy Efficiency 1 Human Brain uses 10−16 joules per operation. 2 Best computers use 10−6 joules per operation. 6 / 63
  • 15. Outline 1 What are Neural Networks? Structure of a Neural Cell Pigeon Experiment Formal Definition of Artificial Neural Network Basic Elements of an Artificial Neuron Types of Activation Functions McCulloch-Pitts model More Advanced Models The Problem of the Vanishing Gradient Fixing the Problem, ReLu function 2 Neural Network As a Graph Introduction Feedback Neural Architectures Knowledge Representation Design of a Neural Network Representing Knowledge in a Neural Networks 7 / 63
  • 16. Pigeon Experiment Watanabe et al. 1995 Pigeons as art experts Experiment Pigeon is in a Skinner box Then, paintings of two different artists (e.g. Chagall / Van Gogh) are presented to it. A Reward is given for pecking when presented a particular artist (e.g. Van Gogh). 8 / 63
  • 17. Pigeon Experiment Watanabe et al. 1995 Pigeons as art experts Experiment Pigeon is in a Skinner box Then, paintings of two different artists (e.g. Chagall / Van Gogh) are presented to it. A Reward is given for pecking when presented a particular artist (e.g. Van Gogh). 8 / 63
  • 18. Pigeon Experiment Watanabe et al. 1995 Pigeons as art experts Experiment Pigeon is in a Skinner box Then, paintings of two different artists (e.g. Chagall / Van Gogh) are presented to it. A Reward is given for pecking when presented a particular artist (e.g. Van Gogh). 8 / 63
  • 19. The Pigeon in the Skinner Box Something like this 9 / 63
  • 20. Results Something Notable Pigeons were able to discriminate between Van Gogh and Chagall with 95% accuracy (when presented with pictures they had been trained on). Discrimination still 85% successful for previously unseen paintings of the artists. Thus 1 Pigeons do not simply memorize the pictures. 2 They can extract and recognize patterns (the ‘style’). 3 They generalize from the already seen to make predictions. 4 This is what neural networks (biological and artificial) are good at (unlike conventional computer). 10 / 63
  • 21. Results Something Notable Pigeons were able to discriminate between Van Gogh and Chagall with 95% accuracy (when presented with pictures they had been trained on). Discrimination still 85% successful for previously unseen paintings of the artists. Thus 1 Pigeons do not simply memorize the pictures. 2 They can extract and recognize patterns (the ‘style’). 3 They generalize from the already seen to make predictions. 4 This is what neural networks (biological and artificial) are good at (unlike conventional computer). 10 / 63
  • 22. Results Something Notable Pigeons were able to discriminate between Van Gogh and Chagall with 95% accuracy (when presented with pictures they had been trained on). Discrimination still 85% successful for previously unseen paintings of the artists. Thus 1 Pigeons do not simply memorize the pictures. 2 They can extract and recognize patterns (the ‘style’). 3 They generalize from the already seen to make predictions. 4 This is what neural networks (biological and artificial) are good at (unlike conventional computer). 10 / 63
  • 23. Results Something Notable Pigeons were able to discriminate between Van Gogh and Chagall with 95% accuracy (when presented with pictures they had been trained on). Discrimination still 85% successful for previously unseen paintings of the artists. Thus 1 Pigeons do not simply memorize the pictures. 2 They can extract and recognize patterns (the ‘style’). 3 They generalize from the already seen to make predictions. 4 This is what neural networks (biological and artificial) are good at (unlike conventional computer). 10 / 63
  • 24. Results Something Notable Pigeons were able to discriminate between Van Gogh and Chagall with 95% accuracy (when presented with pictures they had been trained on). Discrimination still 85% successful for previously unseen paintings of the artists. Thus 1 Pigeons do not simply memorize the pictures. 2 They can extract and recognize patterns (the ‘style’). 3 They generalize from the already seen to make predictions. 4 This is what neural networks (biological and artificial) are good at (unlike conventional computer). 10 / 63
  • 25. Results Something Notable Pigeons were able to discriminate between Van Gogh and Chagall with 95% accuracy (when presented with pictures they had been trained on). Discrimination still 85% successful for previously unseen paintings of the artists. Thus 1 Pigeons do not simply memorize the pictures. 2 They can extract and recognize patterns (the ‘style’). 3 They generalize from the already seen to make predictions. 4 This is what neural networks (biological and artificial) are good at (unlike conventional computer). 10 / 63
  • 26. Outline 1 What are Neural Networks? Structure of a Neural Cell Pigeon Experiment Formal Definition of Artificial Neural Network Basic Elements of an Artificial Neuron Types of Activation Functions McCulloch-Pitts model More Advanced Models The Problem of the Vanishing Gradient Fixing the Problem, ReLu function 2 Neural Network As a Graph Introduction Feedback Neural Architectures Knowledge Representation Design of a Neural Network Representing Knowledge in a Neural Networks 11 / 63
  • 27. Formal Definition Definition An artificial neural network is a massively parallel distributed processor made up of simple processing units. It resembles the brain in two respects: 1 Knowledge is acquired by the network from its environment through a learning process. 2 Inter-neuron connection strengths, known as synaptic weights, are used to store the acquired knowledge. 12 / 63
  • 28. Formal Definition Definition An artificial neural network is a massively parallel distributed processor made up of simple processing units. It resembles the brain in two respects: 1 Knowledge is acquired by the network from its environment through a learning process. 2 Inter-neuron connection strengths, known as synaptic weights, are used to store the acquired knowledge. 12 / 63
  • 29. Formal Definition Definition An artificial neural network is a massively parallel distributed processor made up of simple processing units. It resembles the brain in two respects: 1 Knowledge is acquired by the network from its environment through a learning process. 2 Inter-neuron connection strengths, known as synaptic weights, are used to store the acquired knowledge. 12 / 63
  • 30. Inter-neuron connection strengths? How do the neuron collect this information? Some way to aggregate information needs to be devised... A Classic Use a summation of product of weights by inputs!!! Something like m i=1 wi × xi Where: wi is the strength given to signal xi However: We still need a way to regulate this “aggregation” (Activation function) 13 / 63
  • 31. Inter-neuron connection strengths? How do the neuron collect this information? Some way to aggregate information needs to be devised... A Classic Use a summation of product of weights by inputs!!! Something like m i=1 wi × xi Where: wi is the strength given to signal xi However: We still need a way to regulate this “aggregation” (Activation function) 13 / 63
  • 32. Inter-neuron connection strengths? How do the neuron collect this information? Some way to aggregate information needs to be devised... A Classic Use a summation of product of weights by inputs!!! Something like m i=1 wi × xi Where: wi is the strength given to signal xi However: We still need a way to regulate this “aggregation” (Activation function) 13 / 63
  • 33. The Model of a Artificial Neuron Graphical Representation for neuron k INPUT SIGNALS SYNAPTIC WEIGHTS BIAS SUMMING JUNCTION OUTPUT ACTIVATION FUNCTION 14 / 63
  • 34. Outline 1 What are Neural Networks? Structure of a Neural Cell Pigeon Experiment Formal Definition of Artificial Neural Network Basic Elements of an Artificial Neuron Types of Activation Functions McCulloch-Pitts model More Advanced Models The Problem of the Vanishing Gradient Fixing the Problem, ReLu function 2 Neural Network As a Graph Introduction Feedback Neural Architectures Knowledge Representation Design of a Neural Network Representing Knowledge in a Neural Networks 15 / 63
  • 35. Basic Elements of an Artificial Neuron (AN) Model Set of Connecting links A signal xj, at the input of synapse j connected to neuron k is multiplied by the synaptic weight wkj. The weight of an AN may lie in a negative of positive range. What about the real neuron? In classic literature you only have positive values. Adder An adder for summing the input signals, weighted by the respective synapses of the neuron. Activation function (Squashing function) It limits the amplitude of the output of a neuron. It maps the permissible range of the output signal to an interval. 16 / 63
  • 36. Basic Elements of an Artificial Neuron (AN) Model Set of Connecting links A signal xj, at the input of synapse j connected to neuron k is multiplied by the synaptic weight wkj. The weight of an AN may lie in a negative of positive range. What about the real neuron? In classic literature you only have positive values. Adder An adder for summing the input signals, weighted by the respective synapses of the neuron. Activation function (Squashing function) It limits the amplitude of the output of a neuron. It maps the permissible range of the output signal to an interval. 16 / 63
  • 37. Basic Elements of an Artificial Neuron (AN) Model Set of Connecting links A signal xj, at the input of synapse j connected to neuron k is multiplied by the synaptic weight wkj. The weight of an AN may lie in a negative of positive range. What about the real neuron? In classic literature you only have positive values. Adder An adder for summing the input signals, weighted by the respective synapses of the neuron. Activation function (Squashing function) It limits the amplitude of the output of a neuron. It maps the permissible range of the output signal to an interval. 16 / 63
  • 38. Basic Elements of an Artificial Neuron (AN) Model Set of Connecting links A signal xj, at the input of synapse j connected to neuron k is multiplied by the synaptic weight wkj. The weight of an AN may lie in a negative of positive range. What about the real neuron? In classic literature you only have positive values. Adder An adder for summing the input signals, weighted by the respective synapses of the neuron. Activation function (Squashing function) It limits the amplitude of the output of a neuron. It maps the permissible range of the output signal to an interval. 16 / 63
  • 39. Mathematically Adder uk = m j=1 wkjxj (1) 1 x1, x2, ..., xm are the input signals. 2 wk1, wk2, ..., wkm are the synaptic weights. 3 It is also known as “Affine Transformation.” Activation function yk = ϕ (uk + bk) (2) 1 yk output of neuron. 2 ϕ is the activation function. 17 / 63
  • 40. Mathematically Adder uk = m j=1 wkjxj (1) 1 x1, x2, ..., xm are the input signals. 2 wk1, wk2, ..., wkm are the synaptic weights. 3 It is also known as “Affine Transformation.” Activation function yk = ϕ (uk + bk) (2) 1 yk output of neuron. 2 ϕ is the activation function. 17 / 63
  • 41. Mathematically Adder uk = m j=1 wkjxj (1) 1 x1, x2, ..., xm are the input signals. 2 wk1, wk2, ..., wkm are the synaptic weights. 3 It is also known as “Affine Transformation.” Activation function yk = ϕ (uk + bk) (2) 1 yk output of neuron. 2 ϕ is the activation function. 17 / 63
  • 42. Mathematically Adder uk = m j=1 wkjxj (1) 1 x1, x2, ..., xm are the input signals. 2 wk1, wk2, ..., wkm are the synaptic weights. 3 It is also known as “Affine Transformation.” Activation function yk = ϕ (uk + bk) (2) 1 yk output of neuron. 2 ϕ is the activation function. 17 / 63
  • 43. Mathematically Adder uk = m j=1 wkjxj (1) 1 x1, x2, ..., xm are the input signals. 2 wk1, wk2, ..., wkm are the synaptic weights. 3 It is also known as “Affine Transformation.” Activation function yk = ϕ (uk + bk) (2) 1 yk output of neuron. 2 ϕ is the activation function. 17 / 63
  • 44. Mathematically Adder uk = m j=1 wkjxj (1) 1 x1, x2, ..., xm are the input signals. 2 wk1, wk2, ..., wkm are the synaptic weights. 3 It is also known as “Affine Transformation.” Activation function yk = ϕ (uk + bk) (2) 1 yk output of neuron. 2 ϕ is the activation function. 17 / 63
  • 45. Mathematically Adder uk = m j=1 wkjxj (1) 1 x1, x2, ..., xm are the input signals. 2 wk1, wk2, ..., wkm are the synaptic weights. 3 It is also known as “Affine Transformation.” Activation function yk = ϕ (uk + bk) (2) 1 yk output of neuron. 2 ϕ is the activation function. 17 / 63
  • 46. Integrating the Bias The Affine Transformation 0 Induced Local Field, Bias Linear Combiner's Output, 18 / 63
  • 47. Thus Final Equation vk = m j=0 wkjxj yk = ϕ (vk) With x0 = 1 and wk0 = bk 19 / 63
  • 48. Thus Final Equation vk = m j=0 wkjxj yk = ϕ (vk) With x0 = 1 and wk0 = bk 19 / 63
  • 49. Outline 1 What are Neural Networks? Structure of a Neural Cell Pigeon Experiment Formal Definition of Artificial Neural Network Basic Elements of an Artificial Neuron Types of Activation Functions McCulloch-Pitts model More Advanced Models The Problem of the Vanishing Gradient Fixing the Problem, ReLu function 2 Neural Network As a Graph Introduction Feedback Neural Architectures Knowledge Representation Design of a Neural Network Representing Knowledge in a Neural Networks 20 / 63
  • 50. Types of Activation Functions I Threshold Function ϕ (v) = 1 if v ≥ 0 0 if v < 0 (Heaviside Function) (3) In the following picture 21 / 63
  • 51. Types of Activation Functions I Threshold Function ϕ (v) = 1 if v ≥ 0 0 if v < 0 (Heaviside Function) (3) In the following picture 1 -1-2-3-4 10 2 3 0.5 21 / 63
  • 52. Thus We can use this activation function To generate the first Neural Network Model Clearly The model uses the summation as aggregation operator and a threshold function. 22 / 63
  • 53. Thus We can use this activation function To generate the first Neural Network Model Clearly The model uses the summation as aggregation operator and a threshold function. 22 / 63
  • 54. Outline 1 What are Neural Networks? Structure of a Neural Cell Pigeon Experiment Formal Definition of Artificial Neural Network Basic Elements of an Artificial Neuron Types of Activation Functions McCulloch-Pitts model More Advanced Models The Problem of the Vanishing Gradient Fixing the Problem, ReLu function 2 Neural Network As a Graph Introduction Feedback Neural Architectures Knowledge Representation Design of a Neural Network Representing Knowledge in a Neural Networks 23 / 63
  • 55. McCulloch-Pitts model McCulloch-Pitts model (Pioneers of Neural Networks in the 1940’s) Output Output yk = 1 if vk ≥ θ 0 if vk < θ (4) yk = 1 if vk ≤ θ = 0 0 if vk > θ = 0 with induced local field wk = (1, 1)T vk = m j=1 wkjxj + bk (5) 24 / 63
  • 56. McCulloch-Pitts model McCulloch-Pitts model (Pioneers of Neural Networks in the 1940’s) Output Output yk = 1 if vk ≥ θ 0 if vk < θ (4) yk = 1 if vk ≤ θ = 0 0 if vk > θ = 0 with induced local field wk = (1, 1)T vk = m j=1 wkjxj + bk (5) 24 / 63
  • 57. It is possible to do classic operations in Boolean Algebra AND Gate 25 / 63
  • 58. In the other hand OR Gate 26 / 63
  • 60. And the impact is further understood if you look at this paper Claude Shannon “A Symbolic Analysis of Relay and Switching Circuits” Shannon proved that his switching circuits could be used to simplify the arrangement of the electromechanical relays These circuits could solve all problems that Boolean algebra could solve. Basically, he proved that computer circuits They can solve computational complex problems... 28 / 63
  • 61. And the impact is further understood if you look at this paper Claude Shannon “A Symbolic Analysis of Relay and Switching Circuits” Shannon proved that his switching circuits could be used to simplify the arrangement of the electromechanical relays These circuits could solve all problems that Boolean algebra could solve. Basically, he proved that computer circuits They can solve computational complex problems... 28 / 63
  • 62. And the impact is further understood if you look at this paper Claude Shannon “A Symbolic Analysis of Relay and Switching Circuits” Shannon proved that his switching circuits could be used to simplify the arrangement of the electromechanical relays These circuits could solve all problems that Boolean algebra could solve. Basically, he proved that computer circuits They can solve computational complex problems... 28 / 63
  • 63. And the impact is further understood if you look at this paper Claude Shannon “A Symbolic Analysis of Relay and Switching Circuits” Shannon proved that his switching circuits could be used to simplify the arrangement of the electromechanical relays These circuits could solve all problems that Boolean algebra could solve. Basically, he proved that computer circuits They can solve computational complex problems... 28 / 63
  • 64. Outline 1 What are Neural Networks? Structure of a Neural Cell Pigeon Experiment Formal Definition of Artificial Neural Network Basic Elements of an Artificial Neuron Types of Activation Functions McCulloch-Pitts model More Advanced Models The Problem of the Vanishing Gradient Fixing the Problem, ReLu function 2 Neural Network As a Graph Introduction Feedback Neural Architectures Knowledge Representation Design of a Neural Network Representing Knowledge in a Neural Networks 29 / 63
  • 65. More advanced activation function Piecewise-Linear Function ϕ (v) =    1 if vk ≥ 1 2 v if − 1 2 < vk < 1 2 0 if v ≤ −1 2 (6) Example 30 / 63
  • 66. More advanced activation function Piecewise-Linear Function ϕ (v) =    1 if vk ≥ 1 2 v if − 1 2 < vk < 1 2 0 if v ≤ −1 2 (6) Example 1 -1-2-3-4 10 2 3 0.5 -0.5 0.5 30 / 63
  • 67. Remarks Notes about Piecewise-Linear function The amplification factor inside the linear region of operation is assumed to be unity. Special Cases A linear combiner arises if the linear region of operation is maintained without running into saturation. The piecewise-linear function reduces to a threshold function if the amplification factor of the linear region is made infinitely large. 31 / 63
  • 68. Remarks Notes about Piecewise-Linear function The amplification factor inside the linear region of operation is assumed to be unity. Special Cases A linear combiner arises if the linear region of operation is maintained without running into saturation. The piecewise-linear function reduces to a threshold function if the amplification factor of the linear region is made infinitely large. 31 / 63
  • 69. Remarks Notes about Piecewise-Linear function The amplification factor inside the linear region of operation is assumed to be unity. Special Cases A linear combiner arises if the linear region of operation is maintained without running into saturation. The piecewise-linear function reduces to a threshold function if the amplification factor of the linear region is made infinitely large. 31 / 63
  • 70. A better choice!!! Sigmoid function ϕ (v) = 1 1 + exp {−av} (7) Where a is a slope parameter. As a → ∞ then ϕ → Threshold function. Examples 32 / 63
  • 71. A better choice!!! Sigmoid function ϕ (v) = 1 1 + exp {−av} (7) Where a is a slope parameter. As a → ∞ then ϕ → Threshold function. Examples Increasing 32 / 63
  • 72. Outline 1 What are Neural Networks? Structure of a Neural Cell Pigeon Experiment Formal Definition of Artificial Neural Network Basic Elements of an Artificial Neuron Types of Activation Functions McCulloch-Pitts model More Advanced Models The Problem of the Vanishing Gradient Fixing the Problem, ReLu function 2 Neural Network As a Graph Introduction Feedback Neural Architectures Knowledge Representation Design of a Neural Network Representing Knowledge in a Neural Networks 33 / 63
  • 73. The Problem of the Vanishing Gradient When using a non-linearity However, there is a drawback when using Back-Propagation (As we saw in Machine Learning) under a sigmoid function s (x) = 1 1 + e−x Because if we imagine a Deep Neural Network as a series of layer functions fi y (A) = ft ◦ ft−1 ◦ · · · ◦ f2 ◦ f1 (A) With ft is the last layer. Therefore, we finish with a sequence of derivatives ∂y (A) ∂w1i = ∂ft (ft−1) ∂ft−1 · ∂ft−1 (ft−2) ∂ft−2 · · · · · ∂f2 (f1) ∂f2 · ∂f1 (A) ∂w1i 34 / 63
  • 74. The Problem of the Vanishing Gradient When using a non-linearity However, there is a drawback when using Back-Propagation (As we saw in Machine Learning) under a sigmoid function s (x) = 1 1 + e−x Because if we imagine a Deep Neural Network as a series of layer functions fi y (A) = ft ◦ ft−1 ◦ · · · ◦ f2 ◦ f1 (A) With ft is the last layer. Therefore, we finish with a sequence of derivatives ∂y (A) ∂w1i = ∂ft (ft−1) ∂ft−1 · ∂ft−1 (ft−2) ∂ft−2 · · · · · ∂f2 (f1) ∂f2 · ∂f1 (A) ∂w1i 34 / 63
  • 75. The Problem of the Vanishing Gradient When using a non-linearity However, there is a drawback when using Back-Propagation (As we saw in Machine Learning) under a sigmoid function s (x) = 1 1 + e−x Because if we imagine a Deep Neural Network as a series of layer functions fi y (A) = ft ◦ ft−1 ◦ · · · ◦ f2 ◦ f1 (A) With ft is the last layer. Therefore, we finish with a sequence of derivatives ∂y (A) ∂w1i = ∂ft (ft−1) ∂ft−1 · ∂ft−1 (ft−2) ∂ft−2 · · · · · ∂f2 (f1) ∂f2 · ∂f1 (A) ∂w1i 34 / 63
  • 76. Therefore Given the commutativity of the product You could put together the derivative of the sigmoid’s f (x) = ds (x) dx = e−x (1 + e−x)2 Therefore, deriving again df (x) dx = − e−x (1 + e−x)2 + 2 (e−x) 2 (1 + e−x)3 After making df(x) dx = 0 We have the maximum is at x = 0 35 / 63
  • 77. Therefore Given the commutativity of the product You could put together the derivative of the sigmoid’s f (x) = ds (x) dx = e−x (1 + e−x)2 Therefore, deriving again df (x) dx = − e−x (1 + e−x)2 + 2 (e−x) 2 (1 + e−x)3 After making df(x) dx = 0 We have the maximum is at x = 0 35 / 63
  • 78. Therefore Given the commutativity of the product You could put together the derivative of the sigmoid’s f (x) = ds (x) dx = e−x (1 + e−x)2 Therefore, deriving again df (x) dx = − e−x (1 + e−x)2 + 2 (e−x) 2 (1 + e−x)3 After making df(x) dx = 0 We have the maximum is at x = 0 35 / 63
  • 79. Therefore The maximum for the derivative of the sigmoid f (0) = 0.25 Therefore, Given a Deep Convolutional Network We could finish with lim k→∞ ds (x) dx k = lim k→∞ (0.25)k → 0 A Vanishing Derivative or Vanishing Gradient Making quite difficult to do train a deeper network using this activation function for Deep Learning and even in Shallow Learning 36 / 63
  • 80. Therefore The maximum for the derivative of the sigmoid f (0) = 0.25 Therefore, Given a Deep Convolutional Network We could finish with lim k→∞ ds (x) dx k = lim k→∞ (0.25)k → 0 A Vanishing Derivative or Vanishing Gradient Making quite difficult to do train a deeper network using this activation function for Deep Learning and even in Shallow Learning 36 / 63
  • 81. Therefore The maximum for the derivative of the sigmoid f (0) = 0.25 Therefore, Given a Deep Convolutional Network We could finish with lim k→∞ ds (x) dx k = lim k→∞ (0.25)k → 0 A Vanishing Derivative or Vanishing Gradient Making quite difficult to do train a deeper network using this activation function for Deep Learning and even in Shallow Learning 36 / 63
  • 82. Outline 1 What are Neural Networks? Structure of a Neural Cell Pigeon Experiment Formal Definition of Artificial Neural Network Basic Elements of an Artificial Neuron Types of Activation Functions McCulloch-Pitts model More Advanced Models The Problem of the Vanishing Gradient Fixing the Problem, ReLu function 2 Neural Network As a Graph Introduction Feedback Neural Architectures Knowledge Representation Design of a Neural Network Representing Knowledge in a Neural Networks 37 / 63
  • 83. Thus The need to introduce a new function f (x) = x+ = max (0, x) It is called ReLu or Rectifier With a smooth approximation (Softplus function) f (x) = ln 1 + ekx k 38 / 63
  • 84. Thus The need to introduce a new function f (x) = x+ = max (0, x) It is called ReLu or Rectifier With a smooth approximation (Softplus function) f (x) = ln 1 + ekx k 38 / 63
  • 85. We have When k = 1 +0.4 +1.0 +1.6 +2.2−0.4−1.0−1.6−2.2−2.8 −0.5 +0.5 +1.0 +1.5 +2.0 +2.5 +3.0 +3.5 Softplus ReLu 39 / 63
  • 86. Increase k When k = 104 +0.0006 +0.0012 +0.0018 +0.0024−0.0004−0.001−0.0016−0.0022 −0.001 +0.001 +0.002 +0.003 +0.004 Softplus ReLu 40 / 63
  • 87. Final Remarks Although, ReLu functions They can handle the problem of vanishing problem However, as we will see, saturation starts to appear as a problem As in Hebbian Learning!!! 41 / 63
  • 88. Final Remarks Although, ReLu functions They can handle the problem of vanishing problem However, as we will see, saturation starts to appear as a problem As in Hebbian Learning!!! 41 / 63
  • 89. Outline 1 What are Neural Networks? Structure of a Neural Cell Pigeon Experiment Formal Definition of Artificial Neural Network Basic Elements of an Artificial Neuron Types of Activation Functions McCulloch-Pitts model More Advanced Models The Problem of the Vanishing Gradient Fixing the Problem, ReLu function 2 Neural Network As a Graph Introduction Feedback Neural Architectures Knowledge Representation Design of a Neural Network Representing Knowledge in a Neural Networks 42 / 63
  • 90. Neural Network As a Graph Definition A neural network is a directed graph consisting of nodes with interconnecting synaptic and activation links. Properties 1 Each neuron is represented by a set of linear synaptic links, an externally applied bias, and a possibly nonlinear activation link. 2 The synaptic to links of a neuron weight their respective input signals. 3 The weighted sum of the input signals defines the induced local field of the neuron. 4 The activation link squashes the induced local field of the neuron to produce an output. 43 / 63
  • 91. Neural Network As a Graph Definition A neural network is a directed graph consisting of nodes with interconnecting synaptic and activation links. Properties 1 Each neuron is represented by a set of linear synaptic links, an externally applied bias, and a possibly nonlinear activation link. 2 The synaptic to links of a neuron weight their respective input signals. 3 The weighted sum of the input signals defines the induced local field of the neuron. 4 The activation link squashes the induced local field of the neuron to produce an output. 43 / 63
  • 92. Neural Network As a Graph Definition A neural network is a directed graph consisting of nodes with interconnecting synaptic and activation links. Properties 1 Each neuron is represented by a set of linear synaptic links, an externally applied bias, and a possibly nonlinear activation link. 2 The synaptic to links of a neuron weight their respective input signals. 3 The weighted sum of the input signals defines the induced local field of the neuron. 4 The activation link squashes the induced local field of the neuron to produce an output. 43 / 63
  • 93. Neural Network As a Graph Definition A neural network is a directed graph consisting of nodes with interconnecting synaptic and activation links. Properties 1 Each neuron is represented by a set of linear synaptic links, an externally applied bias, and a possibly nonlinear activation link. 2 The synaptic to links of a neuron weight their respective input signals. 3 The weighted sum of the input signals defines the induced local field of the neuron. 4 The activation link squashes the induced local field of the neuron to produce an output. 43 / 63
  • 94. Neural Network As a Graph Definition A neural network is a directed graph consisting of nodes with interconnecting synaptic and activation links. Properties 1 Each neuron is represented by a set of linear synaptic links, an externally applied bias, and a possibly nonlinear activation link. 2 The synaptic to links of a neuron weight their respective input signals. 3 The weighted sum of the input signals defines the induced local field of the neuron. 4 The activation link squashes the induced local field of the neuron to produce an output. 43 / 63
  • 96. Some Observations Observation A partially complete graph describing a neural architecture has the following characteristics: Source nodes supply input signals to the graph. Each neuron is represented by a single node called a computation node. The communication links provide directions of signal flow in the graph. However Other Representations exist!!! Three main representations ones Block diagram, providing a functional description of the network. Signal-flow graph, providing a complete description of signal flow in the network. Then one we plan to use. Architectural graph, describing the network layout. 45 / 63
  • 97. Some Observations Observation A partially complete graph describing a neural architecture has the following characteristics: Source nodes supply input signals to the graph. Each neuron is represented by a single node called a computation node. The communication links provide directions of signal flow in the graph. However Other Representations exist!!! Three main representations ones Block diagram, providing a functional description of the network. Signal-flow graph, providing a complete description of signal flow in the network. Then one we plan to use. Architectural graph, describing the network layout. 45 / 63
  • 98. Some Observations Observation A partially complete graph describing a neural architecture has the following characteristics: Source nodes supply input signals to the graph. Each neuron is represented by a single node called a computation node. The communication links provide directions of signal flow in the graph. However Other Representations exist!!! Three main representations ones Block diagram, providing a functional description of the network. Signal-flow graph, providing a complete description of signal flow in the network. Then one we plan to use. Architectural graph, describing the network layout. 45 / 63
  • 99. Some Observations Observation A partially complete graph describing a neural architecture has the following characteristics: Source nodes supply input signals to the graph. Each neuron is represented by a single node called a computation node. The communication links provide directions of signal flow in the graph. However Other Representations exist!!! Three main representations ones Block diagram, providing a functional description of the network. Signal-flow graph, providing a complete description of signal flow in the network. Then one we plan to use. Architectural graph, describing the network layout. 45 / 63
  • 100. Some Observations Observation A partially complete graph describing a neural architecture has the following characteristics: Source nodes supply input signals to the graph. Each neuron is represented by a single node called a computation node. The communication links provide directions of signal flow in the graph. However Other Representations exist!!! Three main representations ones Block diagram, providing a functional description of the network. Signal-flow graph, providing a complete description of signal flow in the network. Then one we plan to use. Architectural graph, describing the network layout. 45 / 63
  • 101. Some Observations Observation A partially complete graph describing a neural architecture has the following characteristics: Source nodes supply input signals to the graph. Each neuron is represented by a single node called a computation node. The communication links provide directions of signal flow in the graph. However Other Representations exist!!! Three main representations ones Block diagram, providing a functional description of the network. Signal-flow graph, providing a complete description of signal flow in the network. Then one we plan to use. Architectural graph, describing the network layout. 45 / 63
  • 102. Some Observations Observation A partially complete graph describing a neural architecture has the following characteristics: Source nodes supply input signals to the graph. Each neuron is represented by a single node called a computation node. The communication links provide directions of signal flow in the graph. However Other Representations exist!!! Three main representations ones Block diagram, providing a functional description of the network. Signal-flow graph, providing a complete description of signal flow in the network. Then one we plan to use. Architectural graph, describing the network layout. 45 / 63
  • 103. Some Observations Observation A partially complete graph describing a neural architecture has the following characteristics: Source nodes supply input signals to the graph. Each neuron is represented by a single node called a computation node. The communication links provide directions of signal flow in the graph. However Other Representations exist!!! Three main representations ones Block diagram, providing a functional description of the network. Signal-flow graph, providing a complete description of signal flow in the network. Then one we plan to use. Architectural graph, describing the network layout. 45 / 63
  • 104. Some Observations Observation A partially complete graph describing a neural architecture has the following characteristics: Source nodes supply input signals to the graph. Each neuron is represented by a single node called a computation node. The communication links provide directions of signal flow in the graph. However Other Representations exist!!! Three main representations ones Block diagram, providing a functional description of the network. Signal-flow graph, providing a complete description of signal flow in the network. Then one we plan to use. Architectural graph, describing the network layout. 45 / 63
  • 105. Outline 1 What are Neural Networks? Structure of a Neural Cell Pigeon Experiment Formal Definition of Artificial Neural Network Basic Elements of an Artificial Neuron Types of Activation Functions McCulloch-Pitts model More Advanced Models The Problem of the Vanishing Gradient Fixing the Problem, ReLu function 2 Neural Network As a Graph Introduction Feedback Neural Architectures Knowledge Representation Design of a Neural Network Representing Knowledge in a Neural Networks 46 / 63
  • 106. Feedback Model Feedback Feedback is said to exist in a dynamic system. Indeed, feedback occurs in almost every part of the nervous system of every animal (Freeman, 1975) Feedback Architecture Thus Given xj the internal signal and A (The weights product) operator: Output yk (n) = A xj (n) (8) 47 / 63
  • 107. Feedback Model Feedback Feedback is said to exist in a dynamic system. Indeed, feedback occurs in almost every part of the nervous system of every animal (Freeman, 1975) Feedback Architecture Thus Given xj the internal signal and A (The weights product) operator: Output yk (n) = A xj (n) (8) 47 / 63
  • 108. Feedback Model Feedback Feedback is said to exist in a dynamic system. Indeed, feedback occurs in almost every part of the nervous system of every animal (Freeman, 1975) Feedback Architecture Thus Given xj the internal signal and A (The weights product) operator: Output yk (n) = A xj (n) (8) 47 / 63
  • 109. Thus The internal input has a feedback Given a B operator: xj (n) = xj (n) + B (yk (n)) (9) Then eliminating xj (n) yk (n) = A 1 − AB (xj (n)) (10) Where A 1−AB is known as the closed-loop operator of the system. 48 / 63
  • 110. Thus The internal input has a feedback Given a B operator: xj (n) = xj (n) + B (yk (n)) (9) Then eliminating xj (n) yk (n) = A 1 − AB (xj (n)) (10) Where A 1−AB is known as the closed-loop operator of the system. 48 / 63
  • 111. Outline 1 What are Neural Networks? Structure of a Neural Cell Pigeon Experiment Formal Definition of Artificial Neural Network Basic Elements of an Artificial Neuron Types of Activation Functions McCulloch-Pitts model More Advanced Models The Problem of the Vanishing Gradient Fixing the Problem, ReLu function 2 Neural Network As a Graph Introduction Feedback Neural Architectures Knowledge Representation Design of a Neural Network Representing Knowledge in a Neural Networks 49 / 63
  • 112. Neural Architectures I Single-Layer Feedforward Networks Input Layer of Source Nodes Output Layer of Neurons 50 / 63
  • 113. Observations Observations This network is know as a strictly feed-forward or acyclic type. 51 / 63
  • 114. Neural Architectures II Multilayer Feedforward Networks Input Layer of Source Nodes Output Layer of Neurons Hidden Layer of Neurons 52 / 63
  • 115. Observations Observations 1 This network contains a series of hidden layer. 2 Each hidden layers allows for classification of the new output space of the previous hidden layer. 53 / 63
  • 116. Observations Observations 1 This network contains a series of hidden layer. 2 Each hidden layers allows for classification of the new output space of the previous hidden layer. 53 / 63
  • 117. Neural Architectures III Recurrent Networks UNIT-DELAY OPERATORS 54 / 63
  • 118. Observations Observations 1 This network has not self-feedback loops. 2 It has something known as unit delay operator B = z−1. 55 / 63
  • 119. Observations Observations 1 This network has not self-feedback loops. 2 It has something known as unit delay operator B = z−1. 55 / 63
  • 120. Outline 1 What are Neural Networks? Structure of a Neural Cell Pigeon Experiment Formal Definition of Artificial Neural Network Basic Elements of an Artificial Neuron Types of Activation Functions McCulloch-Pitts model More Advanced Models The Problem of the Vanishing Gradient Fixing the Problem, ReLu function 2 Neural Network As a Graph Introduction Feedback Neural Architectures Knowledge Representation Design of a Neural Network Representing Knowledge in a Neural Networks 56 / 63
  • 121. Knowledge Representation Definition By Fischler and Firschein, 1987 “Knowledge refers to stored information or models used by a person or machine to interpret, predict, and appropriately respond to the outside world. ” It consists of two kinds of information: 1 The known world state, represented by facts about what is and what has been known. 2 Observations (measurements) of the world, obtained by means of sensors. Observations can be 1 Labeled 2 Unlabeled 57 / 63
  • 122. Knowledge Representation Definition By Fischler and Firschein, 1987 “Knowledge refers to stored information or models used by a person or machine to interpret, predict, and appropriately respond to the outside world. ” It consists of two kinds of information: 1 The known world state, represented by facts about what is and what has been known. 2 Observations (measurements) of the world, obtained by means of sensors. Observations can be 1 Labeled 2 Unlabeled 57 / 63
  • 123. Knowledge Representation Definition By Fischler and Firschein, 1987 “Knowledge refers to stored information or models used by a person or machine to interpret, predict, and appropriately respond to the outside world. ” It consists of two kinds of information: 1 The known world state, represented by facts about what is and what has been known. 2 Observations (measurements) of the world, obtained by means of sensors. Observations can be 1 Labeled 2 Unlabeled 57 / 63
  • 124. Knowledge Representation Definition By Fischler and Firschein, 1987 “Knowledge refers to stored information or models used by a person or machine to interpret, predict, and appropriately respond to the outside world. ” It consists of two kinds of information: 1 The known world state, represented by facts about what is and what has been known. 2 Observations (measurements) of the world, obtained by means of sensors. Observations can be 1 Labeled 2 Unlabeled 57 / 63
  • 125. Knowledge Representation Definition By Fischler and Firschein, 1987 “Knowledge refers to stored information or models used by a person or machine to interpret, predict, and appropriately respond to the outside world. ” It consists of two kinds of information: 1 The known world state, represented by facts about what is and what has been known. 2 Observations (measurements) of the world, obtained by means of sensors. Observations can be 1 Labeled 2 Unlabeled 57 / 63
  • 126. Knowledge Representation Definition By Fischler and Firschein, 1987 “Knowledge refers to stored information or models used by a person or machine to interpret, predict, and appropriately respond to the outside world. ” It consists of two kinds of information: 1 The known world state, represented by facts about what is and what has been known. 2 Observations (measurements) of the world, obtained by means of sensors. Observations can be 1 Labeled 2 Unlabeled 57 / 63
  • 127. Outline 1 What are Neural Networks? Structure of a Neural Cell Pigeon Experiment Formal Definition of Artificial Neural Network Basic Elements of an Artificial Neuron Types of Activation Functions McCulloch-Pitts model More Advanced Models The Problem of the Vanishing Gradient Fixing the Problem, ReLu function 2 Neural Network As a Graph Introduction Feedback Neural Architectures Knowledge Representation Design of a Neural Network Representing Knowledge in a Neural Networks 58 / 63
  • 128. Set of Training Data Training Data It consist of input-output pairs (x, y) x= input signal y= desired output Thus, we have the following phases of designing a Neuronal Network 1 Choose appropriate architecture 2 Train the network - learning. 1 Use the Training Data!!! 3 Test the network with data not seen before 1 Use a set of pairs that where not shown to the network so the y component is guessed. 4 Then, you can see how well the network behaves - Generalization Phase. 59 / 63
  • 129. Set of Training Data Training Data It consist of input-output pairs (x, y) x= input signal y= desired output Thus, we have the following phases of designing a Neuronal Network 1 Choose appropriate architecture 2 Train the network - learning. 1 Use the Training Data!!! 3 Test the network with data not seen before 1 Use a set of pairs that where not shown to the network so the y component is guessed. 4 Then, you can see how well the network behaves - Generalization Phase. 59 / 63
  • 130. Set of Training Data Training Data It consist of input-output pairs (x, y) x= input signal y= desired output Thus, we have the following phases of designing a Neuronal Network 1 Choose appropriate architecture 2 Train the network - learning. 1 Use the Training Data!!! 3 Test the network with data not seen before 1 Use a set of pairs that where not shown to the network so the y component is guessed. 4 Then, you can see how well the network behaves - Generalization Phase. 59 / 63
  • 131. Set of Training Data Training Data It consist of input-output pairs (x, y) x= input signal y= desired output Thus, we have the following phases of designing a Neuronal Network 1 Choose appropriate architecture 2 Train the network - learning. 1 Use the Training Data!!! 3 Test the network with data not seen before 1 Use a set of pairs that where not shown to the network so the y component is guessed. 4 Then, you can see how well the network behaves - Generalization Phase. 59 / 63
  • 132. Set of Training Data Training Data It consist of input-output pairs (x, y) x= input signal y= desired output Thus, we have the following phases of designing a Neuronal Network 1 Choose appropriate architecture 2 Train the network - learning. 1 Use the Training Data!!! 3 Test the network with data not seen before 1 Use a set of pairs that where not shown to the network so the y component is guessed. 4 Then, you can see how well the network behaves - Generalization Phase. 59 / 63
  • 133. Set of Training Data Training Data It consist of input-output pairs (x, y) x= input signal y= desired output Thus, we have the following phases of designing a Neuronal Network 1 Choose appropriate architecture 2 Train the network - learning. 1 Use the Training Data!!! 3 Test the network with data not seen before 1 Use a set of pairs that where not shown to the network so the y component is guessed. 4 Then, you can see how well the network behaves - Generalization Phase. 59 / 63
  • 134. Set of Training Data Training Data It consist of input-output pairs (x, y) x= input signal y= desired output Thus, we have the following phases of designing a Neuronal Network 1 Choose appropriate architecture 2 Train the network - learning. 1 Use the Training Data!!! 3 Test the network with data not seen before 1 Use a set of pairs that where not shown to the network so the y component is guessed. 4 Then, you can see how well the network behaves - Generalization Phase. 59 / 63
  • 135. Outline 1 What are Neural Networks? Structure of a Neural Cell Pigeon Experiment Formal Definition of Artificial Neural Network Basic Elements of an Artificial Neuron Types of Activation Functions McCulloch-Pitts model More Advanced Models The Problem of the Vanishing Gradient Fixing the Problem, ReLu function 2 Neural Network As a Graph Introduction Feedback Neural Architectures Knowledge Representation Design of a Neural Network Representing Knowledge in a Neural Networks 60 / 63
  • 136. Representing Knowledge in a Neural Networks Notice the Following The subject of knowledge representation inside an artificial network is very complicated. However: Pattern Classifiers Vs Neural Networks 1 Pattern Classifiers are first designed and then validated by the environment. 2 Neural Networks learns the environment by using the data from it!!! Kurt Hornik et al. proved (1989) “Standard multilayer feedforward networks with as few as one hidden layer using arbitrary squashing functions are capable of approximating any Borel measurable function” (Basically many of the known ones!!!) 61 / 63
  • 137. Representing Knowledge in a Neural Networks Notice the Following The subject of knowledge representation inside an artificial network is very complicated. However: Pattern Classifiers Vs Neural Networks 1 Pattern Classifiers are first designed and then validated by the environment. 2 Neural Networks learns the environment by using the data from it!!! Kurt Hornik et al. proved (1989) “Standard multilayer feedforward networks with as few as one hidden layer using arbitrary squashing functions are capable of approximating any Borel measurable function” (Basically many of the known ones!!!) 61 / 63
  • 138. Representing Knowledge in a Neural Networks Notice the Following The subject of knowledge representation inside an artificial network is very complicated. However: Pattern Classifiers Vs Neural Networks 1 Pattern Classifiers are first designed and then validated by the environment. 2 Neural Networks learns the environment by using the data from it!!! Kurt Hornik et al. proved (1989) “Standard multilayer feedforward networks with as few as one hidden layer using arbitrary squashing functions are capable of approximating any Borel measurable function” (Basically many of the known ones!!!) 61 / 63
  • 139. Rules Knowledge Representation Rule 1 Similar inputs from similar classes should usually produce similar representation. We can use a Metric to measure that similarity!!! Examples 1 d (xi, xj) = xi − xj (Classic Euclidean Metric). 2 d2 ij = (xi − µi)T −1 xj − µj (Mahalanobis distance) where 1 µi = E [xi] . 2 = E (xi − µi) (xi − µi) T = E xj − µj xj − µj T . 62 / 63
  • 140. Rules Knowledge Representation Rule 1 Similar inputs from similar classes should usually produce similar representation. We can use a Metric to measure that similarity!!! Examples 1 d (xi, xj) = xi − xj (Classic Euclidean Metric). 2 d2 ij = (xi − µi)T −1 xj − µj (Mahalanobis distance) where 1 µi = E [xi] . 2 = E (xi − µi) (xi − µi) T = E xj − µj xj − µj T . 62 / 63
  • 141. Rules Knowledge Representation Rule 1 Similar inputs from similar classes should usually produce similar representation. We can use a Metric to measure that similarity!!! Examples 1 d (xi, xj) = xi − xj (Classic Euclidean Metric). 2 d2 ij = (xi − µi)T −1 xj − µj (Mahalanobis distance) where 1 µi = E [xi] . 2 = E (xi − µi) (xi − µi) T = E xj − µj xj − µj T . 62 / 63
  • 142. More Rule 2 Items to be categorized as separate classes should be given widely different representations in the network. Rule 3 If a particular feature is important, then there should be a large number of neurons involved in the representation. Rule 4 Prior information and invariance should be built into the design: Thus, simplify the network by not learning that data. 63 / 63
  • 143. More Rule 2 Items to be categorized as separate classes should be given widely different representations in the network. Rule 3 If a particular feature is important, then there should be a large number of neurons involved in the representation. Rule 4 Prior information and invariance should be built into the design: Thus, simplify the network by not learning that data. 63 / 63
  • 144. More Rule 2 Items to be categorized as separate classes should be given widely different representations in the network. Rule 3 If a particular feature is important, then there should be a large number of neurons involved in the representation. Rule 4 Prior information and invariance should be built into the design: Thus, simplify the network by not learning that data. 63 / 63