SlideShare a Scribd company logo
1 of 33
Artificial Neural Networks from Scratch
 Learn to build neural network from scratch.
o Focus on multi-level feedforward neural networks (multi-level perceptrons)
 Training large neural networks is one of the most important
workload in large scale parallel and distributed systems
o Programming assignments throughout the semester will use this.
What do (deep) neural networks do?
 Learning (highly) non-linear functions.
(0, 0) (1, 0)
(1, 1)
(0, 1)
Logic XOR (⨁) operation
𝒙𝟏 𝒙𝟐 𝒙𝟏⨁𝒙𝟐
0 0 0
0 1 1
1 0 1
1 1 0
Artificial neural network example
 A neural network consists of
layers of artificial neurons and
connections between them.
 Each connection is associated
with a weight.
 Training of a neural network is
to get to the right weights (and
biases) such that the error
across the training data is
minimized.
Hidden
layer 1
Input
layer
Hidden
layer 2
Hidden
layer 3
Output
layer
Training a neural network
 A neural network is trained with m training samples
(𝑥(1)
, 𝑦(1)
), (𝑥(2)
, 𝑦(2)
), ……(𝑥(𝑚)
, 𝑦(𝑚)
)
𝑥(𝑖) is an input vector, 𝑦(𝑖) is an output vector
 Training objective: minimize the prediction error (loss)
𝑚𝑖𝑛
𝑖=1
𝑚
(𝑦 𝑖
− 𝑓𝑊(𝑥 𝑖
))2
𝑓𝑊(𝑥 𝑖 ) is the predicted output vector for the input vector 𝑥(𝑖)
 Approach: Gradient descent (stochastic gradient descent, batch gradient descent, mini-batch
gradient descent).
o Use error to adjust the weight value to reduce the loss. The adjustment amount is proportional to the
contribution of each weight to the loss – Given an error, adjust the weight a little to reduce the error.
Stochastic gradient descent
 Given one training sample (𝑥(𝑖)
, 𝑦(𝑖)
)
 Compute the output of the neural network 𝑓𝑊(𝑥 𝑖 )
 Training objective: minimize the prediction error (loss) – there are different ways
to define error. The following is an example:
𝐸 =
1
2
(𝑦 𝑖 − 𝑓𝑊(𝑥 𝑖 ))2
 Estimate how much each weight 𝑤𝑘 in 𝑊 contributes to the error:
𝜕𝐸
𝜕𝑤𝑘
 Update the weight 𝑤𝑘 by 𝑤𝑘= 𝑤𝑘 − α
𝜕𝐸
𝜕𝑤𝑘
. Here α is the learning rate.
Algorithm for learning artificial neural
network
 Initialize the weights 𝑊 = [𝑊0 , 𝑊1 , … … , 𝑊𝑘 ]
 Training
o For each training data (𝑥 𝑖 , 𝑦(𝑖)), Using forward propagation to compute
the neural network output vector 𝑓𝑊(𝑥 𝑖 )
o Compute the error 𝐸 (various definitions)
o Use backward propagation to compute
𝜕𝐸
𝜕𝑊𝑘
for each weight 𝑊𝑘
o Update 𝑊𝑘= 𝑊𝑘 − α
𝜕𝐸
𝜕𝑊𝑘
o Repeat until E is sufficiently small.
A single neuron
 An artificial neuron has two components: (1) weighted sum and
activation function.
o Many activation functions: Sigmoid, ReLU, etc.
b
w1
𝑋1
(𝑖)
wm
𝑋𝑚
(𝑖)
𝑤1 ∗ 𝑋1
(𝑖)
+ ⋯ + 𝑤𝑚 ∗ 𝑋𝑚
(𝑖)
+b Activation function
Neuron
Sigmoid function
 σ 𝑥 = 𝑠𝑖𝑔𝑚𝑜𝑖𝑑 𝑥 =
1
1+𝑒−𝑥
 The derivative of the sigmoid
function:
𝑑
𝑑𝑥
𝑠𝑖𝑔𝑚𝑜𝑖𝑑 𝑥 =
𝑠𝑖𝑔𝑚𝑜𝑖𝑑 𝑥 1 − 𝑠𝑖𝑔𝑚𝑜𝑖𝑑 𝑥
 σ′ 𝑥 =
𝑑
𝑑𝑥
σ 𝑥 = σ 𝑥 (1 − σ 𝑥 )
Training for the logic AND with a single
neuron
 In general, one neuron can be trained to realize a linear function.
 Logic AND function is a linear function:
(0, 0) (1, 0)
(1, 1)
(0, 1)
Logic AND (⋀) operation
𝒙𝟏 𝒙𝟐 𝒙𝟏⋀𝒙𝟐
0 0 0
0 1 0
1 0 0
1 1 1
Training for the logic AND with a single
neuron
 Consider training data input (𝑋1=0, 𝑋2 = 1), output Y=0.
 NN Output = 0.5
 Error: 𝐸 =
1
2
(𝑌 − 𝑂)2 = 0.125
 To update 𝑤1, 𝑤2, and b, gradient descent needs to compute
𝜕𝐸
𝜕𝑤1
,
𝜕𝐸
𝜕𝑤2
, and
𝜕𝐸
𝜕𝑏
b=0
𝑤1=0
𝑋1=0
𝑤2=0
𝑋2 = 1
∑
𝑠 = 𝑋1𝑤1+𝑋2𝑤2 + 𝑏 = 0
Activation function
Sigmoid(s) O=Sigmoid(0)=0.5
Chain rules for calculating
𝜕𝐸
𝜕𝑤1
,
𝜕𝐸
𝜕𝑤2
, and
𝜕𝐸
𝜕𝑏
 If a variable z depends on the variable y, which itself depends on the
variable x, then z depends on x as well, via the intermediate variable y.
The chain rule is a formula that expresses the derivative as :
𝑑𝑧
𝑑𝑥
=
𝑑𝑧
𝑑𝑦
𝑑𝑦
𝑑𝑥

𝜕𝐸
𝜕𝑊1
=
𝜕𝐸
𝜕𝑂
𝜕𝑂
𝜕𝑠
𝜕𝑠
𝜕𝑊1
b=0
𝑤1=0
𝑋1=0
𝑤2=0
𝑋2 = 1
∑
𝑠 = 𝑋1𝑤1+𝑋2𝑤2 + 𝑏 = 0
Activation function
Sigmoid(s) O=Sigmoid(0)=0.5
Training for the logic AND with a single
neuron

𝜕𝐸
𝜕𝑊1
=
𝜕𝐸
𝜕𝑂
𝜕𝑂
𝜕𝑠
𝜕𝑠
𝜕𝑊1
𝜕𝐸
𝜕𝑂
=
𝜕(
1
2
(𝑌−𝑂)2)
𝜕𝑂
= O − Y = 0.5 − 0 = 0.5

𝜕𝑂
𝜕𝑠
=
𝜕(𝑠𝑖𝑔𝑚𝑜𝑖𝑑 𝑠 )
𝜕𝑠
= sigmoid(s) (1-sigmoid(s)) = 0.5 (1-0.5) = 0.25,
𝜕𝑠
𝜕𝑊1
=
𝜕(𝑋1𝑤1+𝑋2𝑤2+𝑏)
𝜕𝑊1
= 𝑋1 = 0
 To update 𝑤1: 𝑤1 = 𝑤1 − 𝑟𝑎𝑡𝑒 ∗
𝜕𝐸
𝜕𝑊1
= 0 – 0.1*0.5*0.25*0 = 0
 Assume rate = 0.1
b=0
𝑤1=0
𝑋1=0
𝑤2=0
𝑋2 = 1
∑
𝑠 = 𝑋1𝑤1+𝑋2𝑤2 + 𝑏 = 0
Activation function
Sigmoid(s) O=Sigmoid(0)=0.5
Training for the logic AND with a single
neuron

𝜕𝐸
𝜕𝑊2
=
𝜕𝐸
𝜕𝑂
𝜕𝑂
𝜕𝑠
𝜕𝑠
𝜕𝑊2
𝜕𝐸
𝜕𝑂
=
𝜕(
1
2
(𝑌−𝑂)2)
𝜕𝑂
= O − Y = 0.5 − 0 = 0.5

𝜕𝑂
𝜕𝑠
=
𝜕(𝑠𝑖𝑔𝑚𝑜𝑖𝑑 𝑠 )
𝜕𝑠
= sigmoid(s) (1-sigmoid(s)) = 0.5 (1-0.5) = 0.25,
𝜕𝑠
𝜕𝑊2
=
𝜕(𝑋1𝑤1+𝑋2𝑤2+𝑏)
𝜕𝑊2
= 𝑋2 = 1
 To update 𝑤2: 𝑤2 = 𝑤2 − 𝑟𝑎𝑡𝑒 ∗
𝜕𝐸
𝜕𝑊2
= 0 – 0.1*0.5*0.25*1 = -0.0125
b=0
𝑤1=0
𝑋1=0
𝑤2=0
𝑋2 = 1
∑
𝑠 = 𝑋1𝑤1+𝑋2𝑤2 + 𝑏 = 0
Activation function
Sigmoid(s) O=Sigmoid(0)=0.5
Training for the logic AND with a single
neuron

𝜕𝐸
𝜕𝑏
=
𝜕𝐸
𝜕𝑂
𝜕𝑂
𝜕𝑠
𝜕𝑠
𝜕𝑏
𝜕𝐸
𝜕𝑂
=
𝜕(
1
2
(𝑌−𝑂)2)
𝜕𝑂
= O − Y = 0.5 − 0 = 0.5

𝜕𝑂
𝜕𝑠
=
𝜕(𝑠𝑖𝑔𝑚𝑜𝑖𝑑 𝑠 )
𝜕𝑠
= sigmoid(s) (1-sigmoid(s)) = 0.5 (1-0.5) = 0.25,
𝜕𝑠
𝜕𝑏
=
𝜕(𝑋1𝑤1+𝑋2𝑤2+𝑏)
𝜕𝑏
= 1
 To update b: b= 𝑏 − 𝑟𝑎𝑡𝑒 ∗
𝜕𝐸
𝜕𝑏
= 0 – 0.1*0.5*0.25*1 = -0.0125
b=0
𝑤1=0
𝑋1=0
𝑤2=0
𝑋2 = 1
∑
𝑠 = 𝑋1𝑤1+𝑋2𝑤2 + 𝑏 = 0
Activation function
Sigmoid(s) O=Sigmoid(0)=0.5
Training for the logic AND with a single
neuron
 This process is repeated until the error is sufficiently small
 The initial weight should be randomized. Gradient descent can get stuck in the local
optimal.
 See lect7/one.cpp for training the logic AND operation with a single neuron.
 Note: Logic XOR operation is non-linear and cannot be trained with one neuron.
b=-
0.0125
𝑤1=0
𝑋1=0
𝑤2=-0.0125
𝑋2 = 1
∑
𝑠 = 𝑋1𝑤1+𝑋2𝑤2 + 𝑏 = 0
Activation function
Sigmoid(s) O=Sigmoid(0)=0.5
Multi-level feedforward neural networks
 A multi-level feedforward neural network is a neural network that
consists of multiple levels of neurons. Each level can have many neurons
and connections between neurons in different levels do not form loops.
o Information moves in one direction (forward) from input nodes, through hidden
nodes, to output nodes.
 One artificial neuron can only realize a linear function
 Many levels of neurons can combine linear functions can train arbitrarily
complex functions.
o One hidden layer (with infinite number of neurons) can train for any continuous
function.
Multi-level feedforward neural networks
examples
 A layer of neurons that do not directly connect to outputs is called a
hidden layer.
Hidden
layer 1
Input
layer
Hidden
layer 2
Hidden
layer 3
Output
layer
Build a 3-level neural network from scratch
 3 levels: Input level, hidden level, output level
o Other assumptions: fully connected between layers, all neurons use sigmoid
(σ ) as the activation function.
 Notations:
o N0: size of the input level. Input: 𝐼𝑁 𝑁0 = [𝐼𝑁1, 𝐼𝑁2, … , 𝐼𝑁𝑁𝑜]
o N1: size of the hidden layer
o N2: size of the output layer. Output: OO 𝑁2 = [𝑂𝑈𝑇1, 𝑂𝑈𝑇2, … , 𝑂𝑈𝑇𝑁2]
Build a 3-level neural network from scratch
 Notations:
o N0, N1, N2: sizes of the input layer, hidden layer, and output layer, respectively
o N0×N1 weights from input layer to hidden layer. 𝑊0𝑖𝑗: the weight from input unit i to
hidden unit j. B0[N1] biases. B0 𝑁1 = [𝐵01, 𝐵02, … , 𝐵0𝑁1]
o N1×N2 weights from hidden layer to output layer. 𝑊1𝑖𝑗: the weight from hidden unit i to
output unit j. B1[N2] biases. B1 𝑁2 = [𝐵11, 𝐵12, … , 𝐵1𝑁2]
o 𝑊0 𝑁0 𝑁1 =
𝑊01,1 ⋯ 𝑊01,𝑁1
⋮ ⋱ ⋮
𝑊0𝑁0,1 ⋯ 𝑊0𝑁0,𝑁1
, 𝑊1 𝑁1 𝑁2 =
𝑊01,1 ⋯ 𝑊01,𝑁2
⋮ ⋱ ⋮
𝑊0𝑁1,1 ⋯ 𝑊0𝑁1,𝑁2
3-level feedforward neural network
Input layer
Hidden layer
Output layer
N
0
2
1
1 2
N
1
N
2
2
1
Input: IN[N0]
Weight: W0[N0][N1]
Hidden layer biases: B1[N1]
Weight: W1[N1][N2]
Hidden layer biases: B2[N2]
Output: OO[N2]
Hidden layer
Output: HO[N1]
Hidden layer weighted sum: HS[N1]
Output layer weighted sum: HS[N2]
Forward propogation (compute OO and E)
 Compute hidden layer weighted sum: HS 𝑁1 = [𝐻𝑆1, 𝐻𝑆2, … , 𝐻𝑆𝑁1]
o 𝐻𝑆𝑖 = 𝐼𝑁1 × 𝑊01,𝑖 + 𝐼𝑁2 × 𝑊02,𝑖 + ⋯ + 𝐼𝑁𝑁0 × 𝑊0𝑁0,𝑖 + 𝐵1𝑖
o In matrix form: 𝐻𝑆 = 𝐼𝑁 × 𝑊0 + 𝐵1
 Compute hidden layer output: HO 𝑁1 = [𝐻𝑂1, 𝐻𝑂2, … , 𝐻𝑂𝑁1]
o 𝐻𝑂𝑖 = σ(𝐻𝑆𝑖)
o In matrix form: 𝐻𝑂 = σ(HS)
Forward propogation
 From input (IN[N0]), compute output (OO[N2]) and error E.
 Compute output layer weighted sum: OS 𝑁2 = [𝑂𝑆1, 𝑂𝑆2, … , 𝑂𝑆𝑁2]
o 𝑂𝑆𝑖 = 𝐻𝑂1 × 𝑊11,𝑖 + 𝐻𝑂2 × 𝑊12,𝑖 + ⋯ + 𝐻𝑂𝑁1 × 𝑊1𝑁1,𝑖 + 𝐵2𝑖
o In matrix form: 𝐻𝑆 = 𝐻𝑂 × 𝑊1 + 𝐵2
 Compute final output: OO 𝑁2 = [𝑂𝑂1, 𝑂𝑂2, … , 𝑂𝑂𝑁1]
o 𝑂𝑂𝑖 = σ(𝑂𝑆𝑖)
o In matrix form: O𝑂 = σ(OS)
 Let us use mean square error: 𝐸 =
1
𝑁2 𝑖=1
𝑁2
(𝑂𝑂𝑖 − 𝑌𝑖)2
Backward propogation
 To goal is to compute
𝜕𝐸
𝜕𝑊0𝑖,𝑗
,
𝜕𝐸
𝜕𝑊1𝑖,𝑗
,
𝜕𝐸
𝜕𝐵1𝑖
, and
𝜕𝐸
𝜕𝐵2𝑖
.

𝜕𝐸
𝜕𝑂𝑂
=
𝜕𝐸
𝜕𝑂𝑂1
,
𝜕𝐸
𝜕𝑂𝑂2
, … ,
𝜕𝐸
𝜕𝑂𝑂𝑁2
= [
2
𝑁2
𝑂𝑂1 − 𝑌1 ,
2
𝑁2
(𝑂𝑂2 −
Backward propogation
 To goal is to compute
𝜕𝐸
𝜕𝑊0𝑖,𝑗
,
𝜕𝐸
𝜕𝑊1𝑖,𝑗
,
𝜕𝐸
𝜕𝐵1𝑖
, and
𝜕𝐸
𝜕𝐵2𝑖
.

𝜕𝐸
𝜕𝑂𝑂
is done

𝜕𝐸
𝜕𝑂𝑆
=
𝜕𝐸
𝜕𝑂𝑂1
𝜕𝑂𝑂1
𝜕𝑂𝑆1
,
𝜕𝐸
𝜕𝑂𝑂2
𝜕𝑂𝑂2
𝜕𝑂𝑆2
, … ,
𝜕𝐸
𝜕𝑂𝑂𝑁2
𝜕𝑂𝑂𝑁2
𝜕𝑂𝑆𝑁2
=
𝜕𝐸
𝜕𝑂𝑂1
σ(𝑂𝑆1)(1 − σ(𝑂𝑆1)) , … ,
𝜕𝐸
𝜕𝑂𝑂𝑁2
σ(𝑂𝑆𝑁2)(1 − σ(𝑂𝑆𝑁2))
 In matrix form:
𝜕𝐸
𝜕𝑂𝑆
=
2
𝑁2
(𝑂𝑂 − 𝑌) ⊙ 𝑂𝑂 ⊙ (1 − 𝑂𝑂)
 This can be stored in an array dE_OS[N2];
Backward propogation
 To goal is to compute
𝜕𝐸
𝜕𝑊0𝑖,𝑗
,
𝜕𝐸
𝜕𝑊1𝑖,𝑗
,
𝜕𝐸
𝜕𝐵1𝑖
, and
𝜕𝐸
𝜕𝐵2𝑖
.

𝜕𝐸
𝜕𝑂𝑂
,
𝜕𝐸
𝜕𝑂𝑆
are done

𝜕𝐸
𝜕𝐵2
=
𝜕𝐸
𝜕𝑂𝑠1
𝜕𝑂𝑠1
𝜕𝐵21
,
𝜕𝐸
𝜕𝑂𝑠2
𝜕𝑂𝑆2
𝜕𝐵22
, … ,
𝜕𝐸
𝜕𝑂𝑂𝑁2
𝜕𝑂𝑂𝑁2
𝜕𝐵2𝑁2
 𝑂𝑆𝑖 = 𝐻𝑂1 × 𝑊11,𝑖 + 𝐻𝑂2 × 𝑊12,𝑖 + ⋯ + 𝐻𝑂𝑁1 × 𝑊1𝑁1,𝑖 + 𝐵2𝑖
 Hence,
𝜕𝑂𝑠𝑖
𝜕𝐵2𝑖
= 1.

𝜕𝐸
𝜕𝐵2
=
𝜕𝐸
𝜕𝑂𝑠1
,
𝜕𝐸
𝜕𝑂𝑠2
, … ,
𝜕𝐸
𝜕𝑂𝑂𝑁2
=
𝜕𝐸
𝜕𝑂𝑆
Backward propogation
 To goal is to compute
𝜕𝐸
𝜕𝑊0𝑖,𝑗
,
𝜕𝐸
𝜕𝑊1𝑖,𝑗
,
𝜕𝐸
𝜕𝐵1𝑖
, and
𝜕𝐸
𝜕𝐵2𝑖
.

𝜕𝐸
𝜕𝑂𝑂
,
𝜕𝐸
𝜕𝑂𝑆
,
𝜕𝐸
𝜕𝐵2
are done

𝜕𝐸
𝜕𝑊1
=
𝜕𝐸
𝜕𝑂𝑆1
𝜕𝑂𝑆1
𝜕𝑊11,1
⋯
𝜕𝐸
𝜕𝑂𝑆𝑁2
𝜕𝑂𝑆𝑁2
𝜕𝑊11,𝑁2
⋮ ⋱ ⋮
𝜕𝐸
𝜕𝑂𝑆1
𝜕𝑂𝑆1
𝜕𝑊1𝑁1,1
⋯
𝜕𝐸
𝜕𝑂𝑆𝑁2
𝜕𝑂𝑆𝑁2
𝜕𝑊1𝑁1,,𝑁2
 𝑂𝑆𝑖 = 𝐻𝑂1 × 𝑊11,𝑖 + 𝐻𝑂2 × 𝑊12,𝑖 + ⋯ + 𝐻𝑂𝑁1 × 𝑊1𝑁1,𝑖 + 𝐵2𝑖
 Hence,
𝜕𝑂𝑠𝑖
𝜕𝑊1𝑗,𝑖
= 𝐻𝑂𝑗.
i
𝑂𝑆𝑖
𝑊11,𝑖 𝑊1𝑁2,𝑖
Backward propogation
 To goal is to compute
𝜕𝐸
𝜕𝑊0𝑖,𝑗
,
𝜕𝐸
𝜕𝑊1𝑖,𝑗
,
𝜕𝐸
𝜕𝐵1𝑖
, and
𝜕𝐸
𝜕𝐵2𝑖
.

𝜕𝐸
𝜕𝑂𝑂
,
𝜕𝐸
𝜕𝑂𝑆
,
𝜕𝐸
𝜕𝐵2
are done

𝜕𝐸
𝜕𝑊1
=
𝜕𝐸
𝜕𝑂𝑆1
𝜕𝑂𝑆1
𝜕𝑊11,1
⋯
𝜕𝐸
𝜕𝑂𝑆𝑁2
𝜕𝑂𝑆𝑁2
𝜕𝑊11,𝑁2
⋮ ⋱ ⋮
𝜕𝐸
𝜕𝑂𝑆1
𝜕𝑂𝑆1
𝜕𝑊1𝑁1,1
⋯
𝜕𝐸
𝜕𝑂𝑆𝑁1
𝜕𝑂𝑆𝑁2
𝜕𝑊1𝑁1,,𝑁2
=
𝜕𝐸
𝜕𝑂𝑆1
𝐻𝑂1 ⋯
𝜕𝐸
𝜕𝑂𝑆𝑁2
𝐻𝑂1
⋮ ⋱ ⋮
𝜕𝐸
𝜕𝑂𝑆1
𝐻𝑂𝑁1 ⋯
𝜕𝐸
𝜕𝑂𝑆𝑁2
𝐻𝑂𝑁1
 In matrix form:
𝜕𝐸
𝜕𝑊1
= 𝐻𝑂𝑇 𝜕𝐸
𝜕𝑂𝑆
Backward propogation
 To goal is to compute
𝜕𝐸
𝜕𝑊0𝑖,𝑗
,
𝜕𝐸
𝜕𝑊1𝑖,𝑗
,
𝜕𝐸
𝜕𝐵1𝑖
, and
𝜕𝐸
𝜕𝐵2𝑖
.

𝜕𝐸
𝜕𝑂𝑂
,
𝜕𝐸
𝜕𝑂𝑆
,
𝜕𝐸
𝜕𝐵2
,
𝜕𝐸
𝜕𝑊1
are done

𝜕𝐸
𝜕𝐻𝑂
= [
𝜕𝐸
𝜕𝐻𝑂1
,
𝜕𝐸
𝜕𝐻𝑂2
, … … ,
𝜕𝐸
𝜕𝐻𝑂𝑁1
]

𝜕𝐸
𝜕𝐻𝑂𝑖
=
𝜕𝐸
𝜕𝑂𝑆1
𝜕𝑂𝑆1
𝜕𝐻𝑂𝑖
+
𝜕𝐸
𝜕𝑂𝑆2
𝜕𝑂𝑆2
𝜕𝐻𝑂𝑖
+ … +
𝜕𝐸
𝜕𝑂𝑆𝑁2
𝜕𝑂𝑆𝑁2
𝜕𝐻𝑂𝑖
=
𝜕𝐸
𝜕𝑂𝑆1
𝑊1𝑖,1 +
𝜕𝐸
𝜕𝑂𝑆2
𝑊1𝑖,2 + … +
𝜕𝐸
𝜕𝑂𝑆𝑁2
𝑊1𝑖,𝑁2
Backward propogation
 To goal is to compute
𝜕𝐸
𝜕𝑊0𝑖,𝑗
,
𝜕𝐸
𝜕𝑊1𝑖,𝑗
,
𝜕𝐸
𝜕𝐵1𝑖
, and
𝜕𝐸
𝜕𝐵2𝑖
.

𝜕𝐸
𝜕𝑂𝑂
,
𝜕𝐸
𝜕𝑂𝑆
,
𝜕𝐸
𝜕𝐵2
,
𝜕𝐸
𝜕𝑊1
are done

𝜕𝐸
𝜕𝐻𝑂
= [
𝜕𝐸
𝜕𝐻𝑂1
,
𝜕𝐸
𝜕𝐻𝑂2
, … … ,
𝜕𝐸
𝜕𝐻𝑂𝑁1
] =
𝜕𝐸
𝜕𝑂𝑆
𝑊1𝑇
Backward propogation
 To goal is to compute
𝜕𝐸
𝜕𝑊0𝑖,𝑗
,
𝜕𝐸
𝜕𝑊1𝑖,𝑗
,
𝜕𝐸
𝜕𝐵1𝑖
, and
𝜕𝐸
𝜕𝐵2𝑖
.

𝜕𝐸
𝜕𝑂𝑂
,
𝜕𝐸
𝜕𝑂𝑆
,
𝜕𝐸
𝜕𝐵2
,
𝜕𝐸
𝜕𝑊1
,
𝜕𝐸
𝜕𝐻𝑂
are done
 Once
𝜕𝐸
𝜕𝐻𝑂
is computed, we can repeat the process for the hidden
layer by replacing OO with HO, OS with HS, B2 with B1 and W2 with
W1, in the differential equation. Also the input is IN[N0] and the
output is HO[N1].
Summary
 The output of a layer is the input of the next layer.
 Backward propagation uses results from forward propagation.
o
𝜕𝐸
𝜕𝐼𝑁
=
𝜕𝐸
𝜕𝑂
𝑊𝑇,
𝜕𝐸
𝜕𝑊
= 𝐼𝑁𝑇 𝜕𝐸
𝜕𝑂
,
𝜕𝐸
𝜕𝐵
=
𝜕𝐸
𝜕𝑌
layer
IN O Layer 1
X Layer 2 Layer 3
H1 H2
Y
layer
𝜕𝐸
𝜕𝐼𝑁
𝜕𝐸
𝜕𝑂
Training for the logic XOR and AND with a 6-
unit 2-level nueral network
 Logic XOR function is not a linear function (can’t train with
lect8/one.cpp). See 3level.cpp
Logic XOR (⨁) operation
𝒙𝟏 𝒙𝟐 𝒙𝟏⨁𝒙𝟐
0 0 0
0 1 1
1 0 1
1 1 0
(0, 0) (1, 0)
(1, 1)
(0, 1) AND
XOR
Summary
 Briefly discuss multi-level feedforward neural networks
 The training of neural networks
 Following 3level.cpp, one should be able to write a program for any
multi-level feedforward neural networks.

More Related Content

Similar to DNN.pptx

Artificial neural networks
Artificial neural networksArtificial neural networks
Artificial neural networksarjitkantgupta
 
Perceptron 2015.ppt
Perceptron 2015.pptPerceptron 2015.ppt
Perceptron 2015.pptSadafAyesha9
 
Multilayer perceptron
Multilayer perceptronMultilayer perceptron
Multilayer perceptronsmitamm
 
Gan seminar
Gan seminarGan seminar
Gan seminarSan Kim
 
Artificial Neural Networks-Supervised Learning Models
Artificial Neural Networks-Supervised Learning ModelsArtificial Neural Networks-Supervised Learning Models
Artificial Neural Networks-Supervised Learning ModelsDrBaljitSinghKhehra
 
Artificial Neural Networks-Supervised Learning Models
Artificial Neural Networks-Supervised Learning ModelsArtificial Neural Networks-Supervised Learning Models
Artificial Neural Networks-Supervised Learning ModelsDrBaljitSinghKhehra
 
Artificial Neural Networks-Supervised Learning Models
Artificial Neural Networks-Supervised Learning ModelsArtificial Neural Networks-Supervised Learning Models
Artificial Neural Networks-Supervised Learning ModelsDrBaljitSinghKhehra
 
Machine Learning.pdf
Machine Learning.pdfMachine Learning.pdf
Machine Learning.pdfnikola_tesla1
 
تطبيق الشبكة العصبية الاصطناعية (( ANN في كشف اعطال منظومة نقل القدرة الكهربائية
تطبيق الشبكة العصبية الاصطناعية (( ANN في كشف اعطال منظومة نقل القدرة الكهربائيةتطبيق الشبكة العصبية الاصطناعية (( ANN في كشف اعطال منظومة نقل القدرة الكهربائية
تطبيق الشبكة العصبية الاصطناعية (( ANN في كشف اعطال منظومة نقل القدرة الكهربائيةssuserfdec151
 
Artificial Neural Network
Artificial Neural NetworkArtificial Neural Network
Artificial Neural NetworkDessy Amirudin
 
Neural network
Neural networkNeural network
Neural networkmarada0033
 
Neural network 20161210_jintaekseo
Neural network 20161210_jintaekseoNeural network 20161210_jintaekseo
Neural network 20161210_jintaekseoJinTaek Seo
 
Machine Learning: The Bare Math Behind Libraries
Machine Learning: The Bare Math Behind LibrariesMachine Learning: The Bare Math Behind Libraries
Machine Learning: The Bare Math Behind LibrariesJ On The Beach
 
Introduction to Neural Networks and Deep Learning from Scratch
Introduction to Neural Networks and Deep Learning from ScratchIntroduction to Neural Networks and Deep Learning from Scratch
Introduction to Neural Networks and Deep Learning from ScratchAhmed BESBES
 
SOFT COMPUTERING TECHNICS -Unit 1
SOFT COMPUTERING TECHNICS -Unit 1SOFT COMPUTERING TECHNICS -Unit 1
SOFT COMPUTERING TECHNICS -Unit 1sravanthi computers
 

Similar to DNN.pptx (20)

19_Learning.ppt
19_Learning.ppt19_Learning.ppt
19_Learning.ppt
 
Artificial neural networks
Artificial neural networksArtificial neural networks
Artificial neural networks
 
Perceptron 2015.ppt
Perceptron 2015.pptPerceptron 2015.ppt
Perceptron 2015.ppt
 
Neural Networks
Neural NetworksNeural Networks
Neural Networks
 
Multilayer perceptron
Multilayer perceptronMultilayer perceptron
Multilayer perceptron
 
Sparse autoencoder
Sparse autoencoderSparse autoencoder
Sparse autoencoder
 
Unit 1
Unit 1Unit 1
Unit 1
 
Gan seminar
Gan seminarGan seminar
Gan seminar
 
Artificial Neural Networks-Supervised Learning Models
Artificial Neural Networks-Supervised Learning ModelsArtificial Neural Networks-Supervised Learning Models
Artificial Neural Networks-Supervised Learning Models
 
Artificial Neural Networks-Supervised Learning Models
Artificial Neural Networks-Supervised Learning ModelsArtificial Neural Networks-Supervised Learning Models
Artificial Neural Networks-Supervised Learning Models
 
Artificial Neural Networks-Supervised Learning Models
Artificial Neural Networks-Supervised Learning ModelsArtificial Neural Networks-Supervised Learning Models
Artificial Neural Networks-Supervised Learning Models
 
Machine Learning.pdf
Machine Learning.pdfMachine Learning.pdf
Machine Learning.pdf
 
تطبيق الشبكة العصبية الاصطناعية (( ANN في كشف اعطال منظومة نقل القدرة الكهربائية
تطبيق الشبكة العصبية الاصطناعية (( ANN في كشف اعطال منظومة نقل القدرة الكهربائيةتطبيق الشبكة العصبية الاصطناعية (( ANN في كشف اعطال منظومة نقل القدرة الكهربائية
تطبيق الشبكة العصبية الاصطناعية (( ANN في كشف اعطال منظومة نقل القدرة الكهربائية
 
C0531519
C0531519C0531519
C0531519
 
Artificial Neural Network
Artificial Neural NetworkArtificial Neural Network
Artificial Neural Network
 
Neural network
Neural networkNeural network
Neural network
 
Neural network 20161210_jintaekseo
Neural network 20161210_jintaekseoNeural network 20161210_jintaekseo
Neural network 20161210_jintaekseo
 
Machine Learning: The Bare Math Behind Libraries
Machine Learning: The Bare Math Behind LibrariesMachine Learning: The Bare Math Behind Libraries
Machine Learning: The Bare Math Behind Libraries
 
Introduction to Neural Networks and Deep Learning from Scratch
Introduction to Neural Networks and Deep Learning from ScratchIntroduction to Neural Networks and Deep Learning from Scratch
Introduction to Neural Networks and Deep Learning from Scratch
 
SOFT COMPUTERING TECHNICS -Unit 1
SOFT COMPUTERING TECHNICS -Unit 1SOFT COMPUTERING TECHNICS -Unit 1
SOFT COMPUTERING TECHNICS -Unit 1
 

Recently uploaded

Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝soniya singh
 
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINESIVASHANKAR N
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingrakeshbaidya232001
 
High Profile Call Girls Nashik Megha 7001305949 Independent Escort Service Na...
High Profile Call Girls Nashik Megha 7001305949 Independent Escort Service Na...High Profile Call Girls Nashik Megha 7001305949 Independent Escort Service Na...
High Profile Call Girls Nashik Megha 7001305949 Independent Escort Service Na...Call Girls in Nagpur High Profile
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxupamatechverse
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxpurnimasatapathy1234
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxAsutosh Ranjan
 
Current Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCLCurrent Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCLDeelipZope
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxupamatechverse
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSSIVASHANKAR N
 
Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024hassan khalil
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escortsranjana rawat
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Dr.Costas Sachpazis
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerStudy on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerAnamika Sarkar
 

Recently uploaded (20)

Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
 
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writing
 
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
 
High Profile Call Girls Nashik Megha 7001305949 Independent Escort Service Na...
High Profile Call Girls Nashik Megha 7001305949 Independent Escort Service Na...High Profile Call Girls Nashik Megha 7001305949 Independent Escort Service Na...
High Profile Call Girls Nashik Megha 7001305949 Independent Escort Service Na...
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptx
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptx
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptx
 
Current Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCLCurrent Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCL
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptx
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
 
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINEDJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
 
Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerStudy on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
 

DNN.pptx

  • 1. Artificial Neural Networks from Scratch  Learn to build neural network from scratch. o Focus on multi-level feedforward neural networks (multi-level perceptrons)  Training large neural networks is one of the most important workload in large scale parallel and distributed systems o Programming assignments throughout the semester will use this.
  • 2. What do (deep) neural networks do?  Learning (highly) non-linear functions. (0, 0) (1, 0) (1, 1) (0, 1) Logic XOR (⨁) operation 𝒙𝟏 𝒙𝟐 𝒙𝟏⨁𝒙𝟐 0 0 0 0 1 1 1 0 1 1 1 0
  • 3. Artificial neural network example  A neural network consists of layers of artificial neurons and connections between them.  Each connection is associated with a weight.  Training of a neural network is to get to the right weights (and biases) such that the error across the training data is minimized. Hidden layer 1 Input layer Hidden layer 2 Hidden layer 3 Output layer
  • 4. Training a neural network  A neural network is trained with m training samples (𝑥(1) , 𝑦(1) ), (𝑥(2) , 𝑦(2) ), ……(𝑥(𝑚) , 𝑦(𝑚) ) 𝑥(𝑖) is an input vector, 𝑦(𝑖) is an output vector  Training objective: minimize the prediction error (loss) 𝑚𝑖𝑛 𝑖=1 𝑚 (𝑦 𝑖 − 𝑓𝑊(𝑥 𝑖 ))2 𝑓𝑊(𝑥 𝑖 ) is the predicted output vector for the input vector 𝑥(𝑖)  Approach: Gradient descent (stochastic gradient descent, batch gradient descent, mini-batch gradient descent). o Use error to adjust the weight value to reduce the loss. The adjustment amount is proportional to the contribution of each weight to the loss – Given an error, adjust the weight a little to reduce the error.
  • 5. Stochastic gradient descent  Given one training sample (𝑥(𝑖) , 𝑦(𝑖) )  Compute the output of the neural network 𝑓𝑊(𝑥 𝑖 )  Training objective: minimize the prediction error (loss) – there are different ways to define error. The following is an example: 𝐸 = 1 2 (𝑦 𝑖 − 𝑓𝑊(𝑥 𝑖 ))2  Estimate how much each weight 𝑤𝑘 in 𝑊 contributes to the error: 𝜕𝐸 𝜕𝑤𝑘  Update the weight 𝑤𝑘 by 𝑤𝑘= 𝑤𝑘 − α 𝜕𝐸 𝜕𝑤𝑘 . Here α is the learning rate.
  • 6. Algorithm for learning artificial neural network  Initialize the weights 𝑊 = [𝑊0 , 𝑊1 , … … , 𝑊𝑘 ]  Training o For each training data (𝑥 𝑖 , 𝑦(𝑖)), Using forward propagation to compute the neural network output vector 𝑓𝑊(𝑥 𝑖 ) o Compute the error 𝐸 (various definitions) o Use backward propagation to compute 𝜕𝐸 𝜕𝑊𝑘 for each weight 𝑊𝑘 o Update 𝑊𝑘= 𝑊𝑘 − α 𝜕𝐸 𝜕𝑊𝑘 o Repeat until E is sufficiently small.
  • 7. A single neuron  An artificial neuron has two components: (1) weighted sum and activation function. o Many activation functions: Sigmoid, ReLU, etc. b w1 𝑋1 (𝑖) wm 𝑋𝑚 (𝑖) 𝑤1 ∗ 𝑋1 (𝑖) + ⋯ + 𝑤𝑚 ∗ 𝑋𝑚 (𝑖) +b Activation function Neuron
  • 8. Sigmoid function  σ 𝑥 = 𝑠𝑖𝑔𝑚𝑜𝑖𝑑 𝑥 = 1 1+𝑒−𝑥  The derivative of the sigmoid function: 𝑑 𝑑𝑥 𝑠𝑖𝑔𝑚𝑜𝑖𝑑 𝑥 = 𝑠𝑖𝑔𝑚𝑜𝑖𝑑 𝑥 1 − 𝑠𝑖𝑔𝑚𝑜𝑖𝑑 𝑥  σ′ 𝑥 = 𝑑 𝑑𝑥 σ 𝑥 = σ 𝑥 (1 − σ 𝑥 )
  • 9. Training for the logic AND with a single neuron  In general, one neuron can be trained to realize a linear function.  Logic AND function is a linear function: (0, 0) (1, 0) (1, 1) (0, 1) Logic AND (⋀) operation 𝒙𝟏 𝒙𝟐 𝒙𝟏⋀𝒙𝟐 0 0 0 0 1 0 1 0 0 1 1 1
  • 10. Training for the logic AND with a single neuron  Consider training data input (𝑋1=0, 𝑋2 = 1), output Y=0.  NN Output = 0.5  Error: 𝐸 = 1 2 (𝑌 − 𝑂)2 = 0.125  To update 𝑤1, 𝑤2, and b, gradient descent needs to compute 𝜕𝐸 𝜕𝑤1 , 𝜕𝐸 𝜕𝑤2 , and 𝜕𝐸 𝜕𝑏 b=0 𝑤1=0 𝑋1=0 𝑤2=0 𝑋2 = 1 ∑ 𝑠 = 𝑋1𝑤1+𝑋2𝑤2 + 𝑏 = 0 Activation function Sigmoid(s) O=Sigmoid(0)=0.5
  • 11. Chain rules for calculating 𝜕𝐸 𝜕𝑤1 , 𝜕𝐸 𝜕𝑤2 , and 𝜕𝐸 𝜕𝑏  If a variable z depends on the variable y, which itself depends on the variable x, then z depends on x as well, via the intermediate variable y. The chain rule is a formula that expresses the derivative as : 𝑑𝑧 𝑑𝑥 = 𝑑𝑧 𝑑𝑦 𝑑𝑦 𝑑𝑥  𝜕𝐸 𝜕𝑊1 = 𝜕𝐸 𝜕𝑂 𝜕𝑂 𝜕𝑠 𝜕𝑠 𝜕𝑊1 b=0 𝑤1=0 𝑋1=0 𝑤2=0 𝑋2 = 1 ∑ 𝑠 = 𝑋1𝑤1+𝑋2𝑤2 + 𝑏 = 0 Activation function Sigmoid(s) O=Sigmoid(0)=0.5
  • 12. Training for the logic AND with a single neuron  𝜕𝐸 𝜕𝑊1 = 𝜕𝐸 𝜕𝑂 𝜕𝑂 𝜕𝑠 𝜕𝑠 𝜕𝑊1 𝜕𝐸 𝜕𝑂 = 𝜕( 1 2 (𝑌−𝑂)2) 𝜕𝑂 = O − Y = 0.5 − 0 = 0.5  𝜕𝑂 𝜕𝑠 = 𝜕(𝑠𝑖𝑔𝑚𝑜𝑖𝑑 𝑠 ) 𝜕𝑠 = sigmoid(s) (1-sigmoid(s)) = 0.5 (1-0.5) = 0.25, 𝜕𝑠 𝜕𝑊1 = 𝜕(𝑋1𝑤1+𝑋2𝑤2+𝑏) 𝜕𝑊1 = 𝑋1 = 0  To update 𝑤1: 𝑤1 = 𝑤1 − 𝑟𝑎𝑡𝑒 ∗ 𝜕𝐸 𝜕𝑊1 = 0 – 0.1*0.5*0.25*0 = 0  Assume rate = 0.1 b=0 𝑤1=0 𝑋1=0 𝑤2=0 𝑋2 = 1 ∑ 𝑠 = 𝑋1𝑤1+𝑋2𝑤2 + 𝑏 = 0 Activation function Sigmoid(s) O=Sigmoid(0)=0.5
  • 13. Training for the logic AND with a single neuron  𝜕𝐸 𝜕𝑊2 = 𝜕𝐸 𝜕𝑂 𝜕𝑂 𝜕𝑠 𝜕𝑠 𝜕𝑊2 𝜕𝐸 𝜕𝑂 = 𝜕( 1 2 (𝑌−𝑂)2) 𝜕𝑂 = O − Y = 0.5 − 0 = 0.5  𝜕𝑂 𝜕𝑠 = 𝜕(𝑠𝑖𝑔𝑚𝑜𝑖𝑑 𝑠 ) 𝜕𝑠 = sigmoid(s) (1-sigmoid(s)) = 0.5 (1-0.5) = 0.25, 𝜕𝑠 𝜕𝑊2 = 𝜕(𝑋1𝑤1+𝑋2𝑤2+𝑏) 𝜕𝑊2 = 𝑋2 = 1  To update 𝑤2: 𝑤2 = 𝑤2 − 𝑟𝑎𝑡𝑒 ∗ 𝜕𝐸 𝜕𝑊2 = 0 – 0.1*0.5*0.25*1 = -0.0125 b=0 𝑤1=0 𝑋1=0 𝑤2=0 𝑋2 = 1 ∑ 𝑠 = 𝑋1𝑤1+𝑋2𝑤2 + 𝑏 = 0 Activation function Sigmoid(s) O=Sigmoid(0)=0.5
  • 14. Training for the logic AND with a single neuron  𝜕𝐸 𝜕𝑏 = 𝜕𝐸 𝜕𝑂 𝜕𝑂 𝜕𝑠 𝜕𝑠 𝜕𝑏 𝜕𝐸 𝜕𝑂 = 𝜕( 1 2 (𝑌−𝑂)2) 𝜕𝑂 = O − Y = 0.5 − 0 = 0.5  𝜕𝑂 𝜕𝑠 = 𝜕(𝑠𝑖𝑔𝑚𝑜𝑖𝑑 𝑠 ) 𝜕𝑠 = sigmoid(s) (1-sigmoid(s)) = 0.5 (1-0.5) = 0.25, 𝜕𝑠 𝜕𝑏 = 𝜕(𝑋1𝑤1+𝑋2𝑤2+𝑏) 𝜕𝑏 = 1  To update b: b= 𝑏 − 𝑟𝑎𝑡𝑒 ∗ 𝜕𝐸 𝜕𝑏 = 0 – 0.1*0.5*0.25*1 = -0.0125 b=0 𝑤1=0 𝑋1=0 𝑤2=0 𝑋2 = 1 ∑ 𝑠 = 𝑋1𝑤1+𝑋2𝑤2 + 𝑏 = 0 Activation function Sigmoid(s) O=Sigmoid(0)=0.5
  • 15. Training for the logic AND with a single neuron  This process is repeated until the error is sufficiently small  The initial weight should be randomized. Gradient descent can get stuck in the local optimal.  See lect7/one.cpp for training the logic AND operation with a single neuron.  Note: Logic XOR operation is non-linear and cannot be trained with one neuron. b=- 0.0125 𝑤1=0 𝑋1=0 𝑤2=-0.0125 𝑋2 = 1 ∑ 𝑠 = 𝑋1𝑤1+𝑋2𝑤2 + 𝑏 = 0 Activation function Sigmoid(s) O=Sigmoid(0)=0.5
  • 16. Multi-level feedforward neural networks  A multi-level feedforward neural network is a neural network that consists of multiple levels of neurons. Each level can have many neurons and connections between neurons in different levels do not form loops. o Information moves in one direction (forward) from input nodes, through hidden nodes, to output nodes.  One artificial neuron can only realize a linear function  Many levels of neurons can combine linear functions can train arbitrarily complex functions. o One hidden layer (with infinite number of neurons) can train for any continuous function.
  • 17. Multi-level feedforward neural networks examples  A layer of neurons that do not directly connect to outputs is called a hidden layer. Hidden layer 1 Input layer Hidden layer 2 Hidden layer 3 Output layer
  • 18. Build a 3-level neural network from scratch  3 levels: Input level, hidden level, output level o Other assumptions: fully connected between layers, all neurons use sigmoid (σ ) as the activation function.  Notations: o N0: size of the input level. Input: 𝐼𝑁 𝑁0 = [𝐼𝑁1, 𝐼𝑁2, … , 𝐼𝑁𝑁𝑜] o N1: size of the hidden layer o N2: size of the output layer. Output: OO 𝑁2 = [𝑂𝑈𝑇1, 𝑂𝑈𝑇2, … , 𝑂𝑈𝑇𝑁2]
  • 19. Build a 3-level neural network from scratch  Notations: o N0, N1, N2: sizes of the input layer, hidden layer, and output layer, respectively o N0×N1 weights from input layer to hidden layer. 𝑊0𝑖𝑗: the weight from input unit i to hidden unit j. B0[N1] biases. B0 𝑁1 = [𝐵01, 𝐵02, … , 𝐵0𝑁1] o N1×N2 weights from hidden layer to output layer. 𝑊1𝑖𝑗: the weight from hidden unit i to output unit j. B1[N2] biases. B1 𝑁2 = [𝐵11, 𝐵12, … , 𝐵1𝑁2] o 𝑊0 𝑁0 𝑁1 = 𝑊01,1 ⋯ 𝑊01,𝑁1 ⋮ ⋱ ⋮ 𝑊0𝑁0,1 ⋯ 𝑊0𝑁0,𝑁1 , 𝑊1 𝑁1 𝑁2 = 𝑊01,1 ⋯ 𝑊01,𝑁2 ⋮ ⋱ ⋮ 𝑊0𝑁1,1 ⋯ 𝑊0𝑁1,𝑁2
  • 20. 3-level feedforward neural network Input layer Hidden layer Output layer N 0 2 1 1 2 N 1 N 2 2 1 Input: IN[N0] Weight: W0[N0][N1] Hidden layer biases: B1[N1] Weight: W1[N1][N2] Hidden layer biases: B2[N2] Output: OO[N2] Hidden layer Output: HO[N1] Hidden layer weighted sum: HS[N1] Output layer weighted sum: HS[N2]
  • 21. Forward propogation (compute OO and E)  Compute hidden layer weighted sum: HS 𝑁1 = [𝐻𝑆1, 𝐻𝑆2, … , 𝐻𝑆𝑁1] o 𝐻𝑆𝑖 = 𝐼𝑁1 × 𝑊01,𝑖 + 𝐼𝑁2 × 𝑊02,𝑖 + ⋯ + 𝐼𝑁𝑁0 × 𝑊0𝑁0,𝑖 + 𝐵1𝑖 o In matrix form: 𝐻𝑆 = 𝐼𝑁 × 𝑊0 + 𝐵1  Compute hidden layer output: HO 𝑁1 = [𝐻𝑂1, 𝐻𝑂2, … , 𝐻𝑂𝑁1] o 𝐻𝑂𝑖 = σ(𝐻𝑆𝑖) o In matrix form: 𝐻𝑂 = σ(HS)
  • 22. Forward propogation  From input (IN[N0]), compute output (OO[N2]) and error E.  Compute output layer weighted sum: OS 𝑁2 = [𝑂𝑆1, 𝑂𝑆2, … , 𝑂𝑆𝑁2] o 𝑂𝑆𝑖 = 𝐻𝑂1 × 𝑊11,𝑖 + 𝐻𝑂2 × 𝑊12,𝑖 + ⋯ + 𝐻𝑂𝑁1 × 𝑊1𝑁1,𝑖 + 𝐵2𝑖 o In matrix form: 𝐻𝑆 = 𝐻𝑂 × 𝑊1 + 𝐵2  Compute final output: OO 𝑁2 = [𝑂𝑂1, 𝑂𝑂2, … , 𝑂𝑂𝑁1] o 𝑂𝑂𝑖 = σ(𝑂𝑆𝑖) o In matrix form: O𝑂 = σ(OS)  Let us use mean square error: 𝐸 = 1 𝑁2 𝑖=1 𝑁2 (𝑂𝑂𝑖 − 𝑌𝑖)2
  • 23. Backward propogation  To goal is to compute 𝜕𝐸 𝜕𝑊0𝑖,𝑗 , 𝜕𝐸 𝜕𝑊1𝑖,𝑗 , 𝜕𝐸 𝜕𝐵1𝑖 , and 𝜕𝐸 𝜕𝐵2𝑖 .  𝜕𝐸 𝜕𝑂𝑂 = 𝜕𝐸 𝜕𝑂𝑂1 , 𝜕𝐸 𝜕𝑂𝑂2 , … , 𝜕𝐸 𝜕𝑂𝑂𝑁2 = [ 2 𝑁2 𝑂𝑂1 − 𝑌1 , 2 𝑁2 (𝑂𝑂2 −
  • 24. Backward propogation  To goal is to compute 𝜕𝐸 𝜕𝑊0𝑖,𝑗 , 𝜕𝐸 𝜕𝑊1𝑖,𝑗 , 𝜕𝐸 𝜕𝐵1𝑖 , and 𝜕𝐸 𝜕𝐵2𝑖 .  𝜕𝐸 𝜕𝑂𝑂 is done  𝜕𝐸 𝜕𝑂𝑆 = 𝜕𝐸 𝜕𝑂𝑂1 𝜕𝑂𝑂1 𝜕𝑂𝑆1 , 𝜕𝐸 𝜕𝑂𝑂2 𝜕𝑂𝑂2 𝜕𝑂𝑆2 , … , 𝜕𝐸 𝜕𝑂𝑂𝑁2 𝜕𝑂𝑂𝑁2 𝜕𝑂𝑆𝑁2 = 𝜕𝐸 𝜕𝑂𝑂1 σ(𝑂𝑆1)(1 − σ(𝑂𝑆1)) , … , 𝜕𝐸 𝜕𝑂𝑂𝑁2 σ(𝑂𝑆𝑁2)(1 − σ(𝑂𝑆𝑁2))  In matrix form: 𝜕𝐸 𝜕𝑂𝑆 = 2 𝑁2 (𝑂𝑂 − 𝑌) ⊙ 𝑂𝑂 ⊙ (1 − 𝑂𝑂)  This can be stored in an array dE_OS[N2];
  • 25. Backward propogation  To goal is to compute 𝜕𝐸 𝜕𝑊0𝑖,𝑗 , 𝜕𝐸 𝜕𝑊1𝑖,𝑗 , 𝜕𝐸 𝜕𝐵1𝑖 , and 𝜕𝐸 𝜕𝐵2𝑖 .  𝜕𝐸 𝜕𝑂𝑂 , 𝜕𝐸 𝜕𝑂𝑆 are done  𝜕𝐸 𝜕𝐵2 = 𝜕𝐸 𝜕𝑂𝑠1 𝜕𝑂𝑠1 𝜕𝐵21 , 𝜕𝐸 𝜕𝑂𝑠2 𝜕𝑂𝑆2 𝜕𝐵22 , … , 𝜕𝐸 𝜕𝑂𝑂𝑁2 𝜕𝑂𝑂𝑁2 𝜕𝐵2𝑁2  𝑂𝑆𝑖 = 𝐻𝑂1 × 𝑊11,𝑖 + 𝐻𝑂2 × 𝑊12,𝑖 + ⋯ + 𝐻𝑂𝑁1 × 𝑊1𝑁1,𝑖 + 𝐵2𝑖  Hence, 𝜕𝑂𝑠𝑖 𝜕𝐵2𝑖 = 1.  𝜕𝐸 𝜕𝐵2 = 𝜕𝐸 𝜕𝑂𝑠1 , 𝜕𝐸 𝜕𝑂𝑠2 , … , 𝜕𝐸 𝜕𝑂𝑂𝑁2 = 𝜕𝐸 𝜕𝑂𝑆
  • 26. Backward propogation  To goal is to compute 𝜕𝐸 𝜕𝑊0𝑖,𝑗 , 𝜕𝐸 𝜕𝑊1𝑖,𝑗 , 𝜕𝐸 𝜕𝐵1𝑖 , and 𝜕𝐸 𝜕𝐵2𝑖 .  𝜕𝐸 𝜕𝑂𝑂 , 𝜕𝐸 𝜕𝑂𝑆 , 𝜕𝐸 𝜕𝐵2 are done  𝜕𝐸 𝜕𝑊1 = 𝜕𝐸 𝜕𝑂𝑆1 𝜕𝑂𝑆1 𝜕𝑊11,1 ⋯ 𝜕𝐸 𝜕𝑂𝑆𝑁2 𝜕𝑂𝑆𝑁2 𝜕𝑊11,𝑁2 ⋮ ⋱ ⋮ 𝜕𝐸 𝜕𝑂𝑆1 𝜕𝑂𝑆1 𝜕𝑊1𝑁1,1 ⋯ 𝜕𝐸 𝜕𝑂𝑆𝑁2 𝜕𝑂𝑆𝑁2 𝜕𝑊1𝑁1,,𝑁2  𝑂𝑆𝑖 = 𝐻𝑂1 × 𝑊11,𝑖 + 𝐻𝑂2 × 𝑊12,𝑖 + ⋯ + 𝐻𝑂𝑁1 × 𝑊1𝑁1,𝑖 + 𝐵2𝑖  Hence, 𝜕𝑂𝑠𝑖 𝜕𝑊1𝑗,𝑖 = 𝐻𝑂𝑗. i 𝑂𝑆𝑖 𝑊11,𝑖 𝑊1𝑁2,𝑖
  • 27. Backward propogation  To goal is to compute 𝜕𝐸 𝜕𝑊0𝑖,𝑗 , 𝜕𝐸 𝜕𝑊1𝑖,𝑗 , 𝜕𝐸 𝜕𝐵1𝑖 , and 𝜕𝐸 𝜕𝐵2𝑖 .  𝜕𝐸 𝜕𝑂𝑂 , 𝜕𝐸 𝜕𝑂𝑆 , 𝜕𝐸 𝜕𝐵2 are done  𝜕𝐸 𝜕𝑊1 = 𝜕𝐸 𝜕𝑂𝑆1 𝜕𝑂𝑆1 𝜕𝑊11,1 ⋯ 𝜕𝐸 𝜕𝑂𝑆𝑁2 𝜕𝑂𝑆𝑁2 𝜕𝑊11,𝑁2 ⋮ ⋱ ⋮ 𝜕𝐸 𝜕𝑂𝑆1 𝜕𝑂𝑆1 𝜕𝑊1𝑁1,1 ⋯ 𝜕𝐸 𝜕𝑂𝑆𝑁1 𝜕𝑂𝑆𝑁2 𝜕𝑊1𝑁1,,𝑁2 = 𝜕𝐸 𝜕𝑂𝑆1 𝐻𝑂1 ⋯ 𝜕𝐸 𝜕𝑂𝑆𝑁2 𝐻𝑂1 ⋮ ⋱ ⋮ 𝜕𝐸 𝜕𝑂𝑆1 𝐻𝑂𝑁1 ⋯ 𝜕𝐸 𝜕𝑂𝑆𝑁2 𝐻𝑂𝑁1  In matrix form: 𝜕𝐸 𝜕𝑊1 = 𝐻𝑂𝑇 𝜕𝐸 𝜕𝑂𝑆
  • 28. Backward propogation  To goal is to compute 𝜕𝐸 𝜕𝑊0𝑖,𝑗 , 𝜕𝐸 𝜕𝑊1𝑖,𝑗 , 𝜕𝐸 𝜕𝐵1𝑖 , and 𝜕𝐸 𝜕𝐵2𝑖 .  𝜕𝐸 𝜕𝑂𝑂 , 𝜕𝐸 𝜕𝑂𝑆 , 𝜕𝐸 𝜕𝐵2 , 𝜕𝐸 𝜕𝑊1 are done  𝜕𝐸 𝜕𝐻𝑂 = [ 𝜕𝐸 𝜕𝐻𝑂1 , 𝜕𝐸 𝜕𝐻𝑂2 , … … , 𝜕𝐸 𝜕𝐻𝑂𝑁1 ]  𝜕𝐸 𝜕𝐻𝑂𝑖 = 𝜕𝐸 𝜕𝑂𝑆1 𝜕𝑂𝑆1 𝜕𝐻𝑂𝑖 + 𝜕𝐸 𝜕𝑂𝑆2 𝜕𝑂𝑆2 𝜕𝐻𝑂𝑖 + … + 𝜕𝐸 𝜕𝑂𝑆𝑁2 𝜕𝑂𝑆𝑁2 𝜕𝐻𝑂𝑖 = 𝜕𝐸 𝜕𝑂𝑆1 𝑊1𝑖,1 + 𝜕𝐸 𝜕𝑂𝑆2 𝑊1𝑖,2 + … + 𝜕𝐸 𝜕𝑂𝑆𝑁2 𝑊1𝑖,𝑁2
  • 29. Backward propogation  To goal is to compute 𝜕𝐸 𝜕𝑊0𝑖,𝑗 , 𝜕𝐸 𝜕𝑊1𝑖,𝑗 , 𝜕𝐸 𝜕𝐵1𝑖 , and 𝜕𝐸 𝜕𝐵2𝑖 .  𝜕𝐸 𝜕𝑂𝑂 , 𝜕𝐸 𝜕𝑂𝑆 , 𝜕𝐸 𝜕𝐵2 , 𝜕𝐸 𝜕𝑊1 are done  𝜕𝐸 𝜕𝐻𝑂 = [ 𝜕𝐸 𝜕𝐻𝑂1 , 𝜕𝐸 𝜕𝐻𝑂2 , … … , 𝜕𝐸 𝜕𝐻𝑂𝑁1 ] = 𝜕𝐸 𝜕𝑂𝑆 𝑊1𝑇
  • 30. Backward propogation  To goal is to compute 𝜕𝐸 𝜕𝑊0𝑖,𝑗 , 𝜕𝐸 𝜕𝑊1𝑖,𝑗 , 𝜕𝐸 𝜕𝐵1𝑖 , and 𝜕𝐸 𝜕𝐵2𝑖 .  𝜕𝐸 𝜕𝑂𝑂 , 𝜕𝐸 𝜕𝑂𝑆 , 𝜕𝐸 𝜕𝐵2 , 𝜕𝐸 𝜕𝑊1 , 𝜕𝐸 𝜕𝐻𝑂 are done  Once 𝜕𝐸 𝜕𝐻𝑂 is computed, we can repeat the process for the hidden layer by replacing OO with HO, OS with HS, B2 with B1 and W2 with W1, in the differential equation. Also the input is IN[N0] and the output is HO[N1].
  • 31. Summary  The output of a layer is the input of the next layer.  Backward propagation uses results from forward propagation. o 𝜕𝐸 𝜕𝐼𝑁 = 𝜕𝐸 𝜕𝑂 𝑊𝑇, 𝜕𝐸 𝜕𝑊 = 𝐼𝑁𝑇 𝜕𝐸 𝜕𝑂 , 𝜕𝐸 𝜕𝐵 = 𝜕𝐸 𝜕𝑌 layer IN O Layer 1 X Layer 2 Layer 3 H1 H2 Y layer 𝜕𝐸 𝜕𝐼𝑁 𝜕𝐸 𝜕𝑂
  • 32. Training for the logic XOR and AND with a 6- unit 2-level nueral network  Logic XOR function is not a linear function (can’t train with lect8/one.cpp). See 3level.cpp Logic XOR (⨁) operation 𝒙𝟏 𝒙𝟐 𝒙𝟏⨁𝒙𝟐 0 0 0 0 1 1 1 0 1 1 1 0 (0, 0) (1, 0) (1, 1) (0, 1) AND XOR
  • 33. Summary  Briefly discuss multi-level feedforward neural networks  The training of neural networks  Following 3level.cpp, one should be able to write a program for any multi-level feedforward neural networks.