DNN.pptx

Artificial Neural Networks from Scratch
 Learn to build neural network from scratch.
o Focus on multi-level feedforward neural networks (multi-level perceptrons)
 Training large neural networks is one of the most important
workload in large scale parallel and distributed systems
o Programming assignments throughout the semester will use this.

What do (deep) neural networks do?
 Learning (highly) non-linear functions.
(0, 0) (1, 0)
(1, 1)
(0, 1)
Logic XOR (⨁) operation
𝒙𝟏 𝒙𝟐 𝒙𝟏⨁𝒙𝟐
0 0 0
0 1 1
1 0 1
1 1 0

Artificial neural network example
 A neural network consists of
layers of artificial neurons and
connections between them.
 Each connection is associated
with a weight.
 Training of a neural network is
to get to the right weights (and
biases) such that the error
across the training data is
minimized.
Hidden
layer 1
Input
layer
Hidden
layer 2
Hidden
layer 3
Output
layer

Training a neural network
 A neural network is trained with m training samples
(𝑥(1)
, 𝑦(1)
), (𝑥(2)
, 𝑦(2)
), ……(𝑥(𝑚)
, 𝑦(𝑚)
)
𝑥(𝑖) is an input vector, 𝑦(𝑖) is an output vector
 Training objective: minimize the prediction error (loss)
𝑚𝑖𝑛
𝑖=1
𝑚
(𝑦 𝑖
− 𝑓𝑊(𝑥 𝑖
))2
𝑓𝑊(𝑥 𝑖 ) is the predicted output vector for the input vector 𝑥(𝑖)
 Approach: Gradient descent (stochastic gradient descent, batch gradient descent, mini-batch
gradient descent).
o Use error to adjust the weight value to reduce the loss. The adjustment amount is proportional to the
contribution of each weight to the loss – Given an error, adjust the weight a little to reduce the error.

Stochastic gradient descent
 Given one training sample (𝑥(𝑖)
, 𝑦(𝑖)
)
 Compute the output of the neural network 𝑓𝑊(𝑥 𝑖 )
 Training objective: minimize the prediction error (loss) – there are different ways
to define error. The following is an example:
𝐸 =
1
2
(𝑦 𝑖 − 𝑓𝑊(𝑥 𝑖 ))2
 Estimate how much each weight 𝑤𝑘 in 𝑊 contributes to the error:
𝜕𝐸
𝜕𝑤𝑘
 Update the weight 𝑤𝑘 by 𝑤𝑘= 𝑤𝑘 − α
𝜕𝐸
𝜕𝑤𝑘
. Here α is the learning rate.

Algorithm for learning artificial neural
network
 Initialize the weights 𝑊 = [𝑊0 , 𝑊1 , … … , 𝑊𝑘 ]
 Training
o For each training data (𝑥 𝑖 , 𝑦(𝑖)), Using forward propagation to compute
the neural network output vector 𝑓𝑊(𝑥 𝑖 )
o Compute the error 𝐸 (various definitions)
o Use backward propagation to compute
𝜕𝐸
𝜕𝑊𝑘
for each weight 𝑊𝑘
o Update 𝑊𝑘= 𝑊𝑘 − α
𝜕𝐸
𝜕𝑊𝑘
o Repeat until E is sufficiently small.

A single neuron
 An artificial neuron has two components: (1) weighted sum and
activation function.
o Many activation functions: Sigmoid, ReLU, etc.
b
w1
𝑋1
(𝑖)
wm
𝑋𝑚
(𝑖)
𝑤1 ∗ 𝑋1
(𝑖)
+ ⋯ + 𝑤𝑚 ∗ 𝑋𝑚
(𝑖)
+b Activation function
Neuron

Sigmoid function
 σ 𝑥 = 𝑠𝑖𝑔𝑚𝑜𝑖𝑑 𝑥 =
1
1+𝑒−𝑥
 The derivative of the sigmoid
function:
𝑑
𝑑𝑥
𝑠𝑖𝑔𝑚𝑜𝑖𝑑 𝑥 =
𝑠𝑖𝑔𝑚𝑜𝑖𝑑 𝑥 1 − 𝑠𝑖𝑔𝑚𝑜𝑖𝑑 𝑥
 σ′ 𝑥 =
𝑑
𝑑𝑥
σ 𝑥 = σ 𝑥 (1 − σ 𝑥 )

Training for the logic AND with a single
neuron
 In general, one neuron can be trained to realize a linear function.
 Logic AND function is a linear function:
(0, 0) (1, 0)
(1, 1)
(0, 1)
Logic AND (⋀) operation
𝒙𝟏 𝒙𝟐 𝒙𝟏⋀𝒙𝟐
0 0 0
0 1 0
1 0 0
1 1 1

neuron
 Consider training data input (𝑋1=0, 𝑋2 = 1), output Y=0.
 NN Output = 0.5
 Error: 𝐸 =
1
2
(𝑌 − 𝑂)2 = 0.125
 To update 𝑤1, 𝑤2, and b, gradient descent needs to compute
𝜕𝐸
𝜕𝑤1
,
𝜕𝐸
𝜕𝑤2
, and
𝜕𝐸
𝜕𝑏
b=0
𝑤1=0
𝑋1=0
𝑤2=0
𝑋2 = 1
∑
𝑠 = 𝑋1𝑤1+𝑋2𝑤2 + 𝑏 = 0
Activation function
Sigmoid(s) O=Sigmoid(0)=0.5

Chain rules for calculating
𝜕𝐸
𝜕𝑤1
,
𝜕𝐸
𝜕𝑤2
, and
𝜕𝐸
𝜕𝑏
 If a variable z depends on the variable y, which itself depends on the
variable x, then z depends on x as well, via the intermediate variable y.
The chain rule is a formula that expresses the derivative as :
𝑑𝑧
𝑑𝑥
=
𝑑𝑧
𝑑𝑦
𝑑𝑦
𝑑𝑥

𝜕𝐸
𝜕𝑊1
=
𝜕𝐸
𝜕𝑂
𝜕𝑂
𝜕𝑠
𝜕𝑠
𝜕𝑊1
b=0
𝑤1=0
𝑋1=0
𝑤2=0
𝑋2 = 1
∑
𝑠 = 𝑋1𝑤1+𝑋2𝑤2 + 𝑏 = 0
Activation function

neuron

𝜕𝐸
𝜕𝑊1
=
𝜕𝐸
𝜕𝑂
𝜕𝑂
𝜕𝑠
𝜕𝑠
𝜕𝑊1
𝜕𝐸
𝜕𝑂
=
𝜕(
1
2
(𝑌−𝑂)2)
𝜕𝑂
= O − Y = 0.5 − 0 = 0.5

𝜕𝑂
𝜕𝑠
=
𝜕(𝑠𝑖𝑔𝑚𝑜𝑖𝑑 𝑠 )
𝜕𝑠
= sigmoid(s) (1-sigmoid(s)) = 0.5 (1-0.5) = 0.25,
𝜕𝑠
𝜕𝑊1
=
𝜕(𝑋1𝑤1+𝑋2𝑤2+𝑏)
𝜕𝑊1
= 𝑋1 = 0
 To update 𝑤1: 𝑤1 = 𝑤1 − 𝑟𝑎𝑡𝑒 ∗
𝜕𝐸
𝜕𝑊1
= 0 – 0.1*0.5*0.25*0 = 0
 Assume rate = 0.1
b=0
𝑤1=0
𝑋1=0
𝑤2=0
𝑋2 = 1
∑
𝑠 = 𝑋1𝑤1+𝑋2𝑤2 + 𝑏 = 0
Activation function

neuron

𝜕𝐸
𝜕𝑊2
=
𝜕𝐸
𝜕𝑂
𝜕𝑂
𝜕𝑠
𝜕𝑠
𝜕𝑊2
𝜕𝐸
𝜕𝑂
=
𝜕(
1
2
(𝑌−𝑂)2)
𝜕𝑂
= O − Y = 0.5 − 0 = 0.5

𝜕𝑂
𝜕𝑠
=
𝜕𝑠
𝜕𝑠
𝜕𝑊2
=
𝜕(𝑋1𝑤1+𝑋2𝑤2+𝑏)
𝜕𝑊2
= 𝑋2 = 1
 To update 𝑤2: 𝑤2 = 𝑤2 − 𝑟𝑎𝑡𝑒 ∗
𝜕𝐸
𝜕𝑊2
= 0 – 0.1*0.5*0.25*1 = -0.0125
b=0
𝑤1=0
𝑋1=0
𝑤2=0
𝑋2 = 1
∑
𝑠 = 𝑋1𝑤1+𝑋2𝑤2 + 𝑏 = 0
Activation function

neuron

𝜕𝐸
𝜕𝑏
=
𝜕𝐸
𝜕𝑂
𝜕𝑂
𝜕𝑠
𝜕𝑠
𝜕𝑏
𝜕𝐸
𝜕𝑂
=
𝜕(
1
2
(𝑌−𝑂)2)
𝜕𝑂
= O − Y = 0.5 − 0 = 0.5

𝜕𝑂
𝜕𝑠
=
𝜕𝑠
𝜕𝑠
𝜕𝑏
=
𝜕(𝑋1𝑤1+𝑋2𝑤2+𝑏)
𝜕𝑏
= 1
 To update b: b= 𝑏 − 𝑟𝑎𝑡𝑒 ∗
𝜕𝐸
𝜕𝑏
= 0 – 0.1*0.5*0.25*1 = -0.0125
b=0
𝑤1=0
𝑋1=0
𝑤2=0
𝑋2 = 1
∑
𝑠 = 𝑋1𝑤1+𝑋2𝑤2 + 𝑏 = 0
Activation function

neuron
 This process is repeated until the error is sufficiently small
 The initial weight should be randomized. Gradient descent can get stuck in the local
optimal.
 See lect7/one.cpp for training the logic AND operation with a single neuron.
 Note: Logic XOR operation is non-linear and cannot be trained with one neuron.
b=-
0.0125
𝑤1=0
𝑋1=0
𝑤2=-0.0125
𝑋2 = 1
∑
𝑠 = 𝑋1𝑤1+𝑋2𝑤2 + 𝑏 = 0
Activation function

Multi-level feedforward neural networks
 A multi-level feedforward neural network is a neural network that
consists of multiple levels of neurons. Each level can have many neurons
and connections between neurons in different levels do not form loops.
o Information moves in one direction (forward) from input nodes, through hidden
nodes, to output nodes.
 One artificial neuron can only realize a linear function
 Many levels of neurons can combine linear functions can train arbitrarily
complex functions.
o One hidden layer (with infinite number of neurons) can train for any continuous
function.

Multi-level feedforward neural networks
examples
 A layer of neurons that do not directly connect to outputs is called a
hidden layer.
Hidden
layer 1
Input
layer
Hidden
layer 2
Hidden
layer 3
Output
layer

Build a 3-level neural network from scratch
 3 levels: Input level, hidden level, output level
o Other assumptions: fully connected between layers, all neurons use sigmoid
(σ ) as the activation function.
 Notations:
o N0: size of the input level. Input: 𝐼𝑁 𝑁0 = [𝐼𝑁1, 𝐼𝑁2, … , 𝐼𝑁𝑁𝑜]
o N1: size of the hidden layer
o N2: size of the output layer. Output: OO 𝑁2 = [𝑂𝑈𝑇1, 𝑂𝑈𝑇2, … , 𝑂𝑈𝑇𝑁2]

Build a 3-level neural network from scratch
 Notations:
o N0, N1, N2: sizes of the input layer, hidden layer, and output layer, respectively
o N0×N1 weights from input layer to hidden layer. 𝑊0𝑖𝑗: the weight from input unit i to
hidden unit j. B0[N1] biases. B0 𝑁1 = [𝐵01, 𝐵02, … , 𝐵0𝑁1]
o N1×N2 weights from hidden layer to output layer. 𝑊1𝑖𝑗: the weight from hidden unit i to
output unit j. B1[N2] biases. B1 𝑁2 = [𝐵11, 𝐵12, … , 𝐵1𝑁2]
o 𝑊0 𝑁0 𝑁1 =
𝑊01,1 ⋯ 𝑊01,𝑁1
⋮ ⋱ ⋮
𝑊0𝑁0,1 ⋯ 𝑊0𝑁0,𝑁1
, 𝑊1 𝑁1 𝑁2 =
𝑊01,1 ⋯ 𝑊01,𝑁2
⋮ ⋱ ⋮
𝑊0𝑁1,1 ⋯ 𝑊0𝑁1,𝑁2

3-level feedforward neural network
Input layer
Hidden layer
Output layer
N
0
2
1
1 2
N
1
N
2
2
1
Input: IN[N0]
Weight: W0[N0][N1]
Hidden layer biases: B1[N1]
Weight: W1[N1][N2]
Hidden layer biases: B2[N2]
Output: OO[N2]
Hidden layer
Output: HO[N1]
Hidden layer weighted sum: HS[N1]
Output layer weighted sum: HS[N2]

Forward propogation (compute OO and E)
 Compute hidden layer weighted sum: HS 𝑁1 = [𝐻𝑆1, 𝐻𝑆2, … , 𝐻𝑆𝑁1]
o 𝐻𝑆𝑖 = 𝐼𝑁1 × 𝑊01,𝑖 + 𝐼𝑁2 × 𝑊02,𝑖 + ⋯ + 𝐼𝑁𝑁0 × 𝑊0𝑁0,𝑖 + 𝐵1𝑖
o In matrix form: 𝐻𝑆 = 𝐼𝑁 × 𝑊0 + 𝐵1
 Compute hidden layer output: HO 𝑁1 = [𝐻𝑂1, 𝐻𝑂2, … , 𝐻𝑂𝑁1]
o 𝐻𝑂𝑖 = σ(𝐻𝑆𝑖)
o In matrix form: 𝐻𝑂 = σ(HS)

Forward propogation
 From input (IN[N0]), compute output (OO[N2]) and error E.
 Compute output layer weighted sum: OS 𝑁2 = [𝑂𝑆1, 𝑂𝑆2, … , 𝑂𝑆𝑁2]
o 𝑂𝑆𝑖 = 𝐻𝑂1 × 𝑊11,𝑖 + 𝐻𝑂2 × 𝑊12,𝑖 + ⋯ + 𝐻𝑂𝑁1 × 𝑊1𝑁1,𝑖 + 𝐵2𝑖
o In matrix form: 𝐻𝑆 = 𝐻𝑂 × 𝑊1 + 𝐵2
 Compute final output: OO 𝑁2 = [𝑂𝑂1, 𝑂𝑂2, … , 𝑂𝑂𝑁1]
o 𝑂𝑂𝑖 = σ(𝑂𝑆𝑖)
o In matrix form: O𝑂 = σ(OS)
 Let us use mean square error: 𝐸 =
1
𝑁2 𝑖=1
𝑁2
(𝑂𝑂𝑖 − 𝑌𝑖)2

Backward propogation
 To goal is to compute
𝜕𝐸
𝜕𝑊0𝑖,𝑗
,
𝜕𝐸
𝜕𝑊1𝑖,𝑗
,
𝜕𝐸
𝜕𝐵1𝑖
, and
𝜕𝐸
𝜕𝐵2𝑖
.

𝜕𝐸
𝜕𝑂𝑂
=
𝜕𝐸
𝜕𝑂𝑂1
,
𝜕𝐸
𝜕𝑂𝑂2
, … ,
𝜕𝐸
𝜕𝑂𝑂𝑁2
= [
2
𝑁2
𝑂𝑂1 − 𝑌1 ,
2
𝑁2
(𝑂𝑂2 −

𝜕𝐸
𝜕𝑊0𝑖,𝑗
,
𝜕𝐸
𝜕𝑊1𝑖,𝑗
,
𝜕𝐸
𝜕𝐵1𝑖
, and
𝜕𝐸
𝜕𝐵2𝑖
.

𝜕𝐸
𝜕𝑂𝑂
is done

𝜕𝐸
𝜕𝑂𝑆
=
𝜕𝐸
𝜕𝑂𝑂1
𝜕𝑂𝑂1
𝜕𝑂𝑆1
,
𝜕𝐸
𝜕𝑂𝑂2
𝜕𝑂𝑂2
𝜕𝑂𝑆2
, … ,
𝜕𝐸
𝜕𝑂𝑂𝑁2
𝜕𝑂𝑂𝑁2
𝜕𝑂𝑆𝑁2
=
𝜕𝐸
𝜕𝑂𝑂1
σ(𝑂𝑆1)(1 − σ(𝑂𝑆1)) , … ,
𝜕𝐸
𝜕𝑂𝑂𝑁2
σ(𝑂𝑆𝑁2)(1 − σ(𝑂𝑆𝑁2))
 In matrix form:
𝜕𝐸
𝜕𝑂𝑆
=
2
𝑁2
(𝑂𝑂 − 𝑌) ⊙ 𝑂𝑂 ⊙ (1 − 𝑂𝑂)
 This can be stored in an array dE_OS[N2];

𝜕𝐸
𝜕𝑊0𝑖,𝑗
,
𝜕𝐸
𝜕𝑊1𝑖,𝑗
,
𝜕𝐸
𝜕𝐵1𝑖
, and
𝜕𝐸
𝜕𝐵2𝑖
.

𝜕𝐸
𝜕𝑂𝑂
,
𝜕𝐸
𝜕𝑂𝑆
are done

𝜕𝐸
𝜕𝐵2
=
𝜕𝐸
𝜕𝑂𝑠1
𝜕𝑂𝑠1
𝜕𝐵21
,
𝜕𝐸
𝜕𝑂𝑠2
𝜕𝑂𝑆2
𝜕𝐵22
, … ,
𝜕𝐸
𝜕𝑂𝑂𝑁2
𝜕𝑂𝑂𝑁2
𝜕𝐵2𝑁2
 𝑂𝑆𝑖 = 𝐻𝑂1 × 𝑊11,𝑖 + 𝐻𝑂2 × 𝑊12,𝑖 + ⋯ + 𝐻𝑂𝑁1 × 𝑊1𝑁1,𝑖 + 𝐵2𝑖
 Hence,
𝜕𝑂𝑠𝑖
𝜕𝐵2𝑖
= 1.

𝜕𝐸
𝜕𝐵2
=
𝜕𝐸
𝜕𝑂𝑠1
,
𝜕𝐸
𝜕𝑂𝑠2
, … ,
𝜕𝐸
𝜕𝑂𝑂𝑁2
=
𝜕𝐸
𝜕𝑂𝑆

𝜕𝐸
𝜕𝑊0𝑖,𝑗
,
𝜕𝐸
𝜕𝑊1𝑖,𝑗
,
𝜕𝐸
𝜕𝐵1𝑖
, and
𝜕𝐸
𝜕𝐵2𝑖
.

𝜕𝐸
𝜕𝑂𝑂
,
𝜕𝐸
𝜕𝑂𝑆
,
𝜕𝐸
𝜕𝐵2
are done

𝜕𝐸
𝜕𝑊1
=
𝜕𝐸
𝜕𝑂𝑆1
𝜕𝑂𝑆1
𝜕𝑊11,1
⋯
𝜕𝐸
𝜕𝑂𝑆𝑁2
𝜕𝑂𝑆𝑁2
𝜕𝑊11,𝑁2
⋮ ⋱ ⋮
𝜕𝐸
𝜕𝑂𝑆1
𝜕𝑂𝑆1
𝜕𝑊1𝑁1,1
⋯
𝜕𝐸
𝜕𝑂𝑆𝑁2
𝜕𝑂𝑆𝑁2
𝜕𝑊1𝑁1,,𝑁2
 𝑂𝑆𝑖 = 𝐻𝑂1 × 𝑊11,𝑖 + 𝐻𝑂2 × 𝑊12,𝑖 + ⋯ + 𝐻𝑂𝑁1 × 𝑊1𝑁1,𝑖 + 𝐵2𝑖
 Hence,
𝜕𝑂𝑠𝑖
𝜕𝑊1𝑗,𝑖
= 𝐻𝑂𝑗.
i
𝑂𝑆𝑖
𝑊11,𝑖 𝑊1𝑁2,𝑖

𝜕𝐸
𝜕𝑊0𝑖,𝑗
,
𝜕𝐸
𝜕𝑊1𝑖,𝑗
,
𝜕𝐸
𝜕𝐵1𝑖
, and
𝜕𝐸
𝜕𝐵2𝑖
.

𝜕𝐸
𝜕𝑂𝑂
,
𝜕𝐸
𝜕𝑂𝑆
,
𝜕𝐸
𝜕𝐵2
are done

𝜕𝐸
𝜕𝑊1
=
𝜕𝐸
𝜕𝑂𝑆1
𝜕𝑂𝑆1
𝜕𝑊11,1
⋯
𝜕𝐸
𝜕𝑂𝑆𝑁2
𝜕𝑂𝑆𝑁2
𝜕𝑊11,𝑁2
⋮ ⋱ ⋮
𝜕𝐸
𝜕𝑂𝑆1
𝜕𝑂𝑆1
𝜕𝑊1𝑁1,1
⋯
𝜕𝐸
𝜕𝑂𝑆𝑁1
𝜕𝑂𝑆𝑁2
𝜕𝑊1𝑁1,,𝑁2
=
𝜕𝐸
𝜕𝑂𝑆1
𝐻𝑂1 ⋯
𝜕𝐸
𝜕𝑂𝑆𝑁2
𝐻𝑂1
⋮ ⋱ ⋮
𝜕𝐸
𝜕𝑂𝑆1
𝐻𝑂𝑁1 ⋯
𝜕𝐸
𝜕𝑂𝑆𝑁2
𝐻𝑂𝑁1
 In matrix form:
𝜕𝐸
𝜕𝑊1
= 𝐻𝑂𝑇 𝜕𝐸
𝜕𝑂𝑆

𝜕𝐸
𝜕𝑊0𝑖,𝑗
,
𝜕𝐸
𝜕𝑊1𝑖,𝑗
,
𝜕𝐸
𝜕𝐵1𝑖
, and
𝜕𝐸
𝜕𝐵2𝑖
.

𝜕𝐸
𝜕𝑂𝑂
,
𝜕𝐸
𝜕𝑂𝑆
,
𝜕𝐸
𝜕𝐵2
,
𝜕𝐸
𝜕𝑊1
are done

𝜕𝐸
𝜕𝐻𝑂
= [
𝜕𝐸
𝜕𝐻𝑂1
,
𝜕𝐸
𝜕𝐻𝑂2
, … … ,
𝜕𝐸
𝜕𝐻𝑂𝑁1
]

𝜕𝐸
𝜕𝐻𝑂𝑖
=
𝜕𝐸
𝜕𝑂𝑆1
𝜕𝑂𝑆1
𝜕𝐻𝑂𝑖
+
𝜕𝐸
𝜕𝑂𝑆2
𝜕𝑂𝑆2
𝜕𝐻𝑂𝑖
+ … +
𝜕𝐸
𝜕𝑂𝑆𝑁2
𝜕𝑂𝑆𝑁2
𝜕𝐻𝑂𝑖
=
𝜕𝐸
𝜕𝑂𝑆1
𝑊1𝑖,1 +
𝜕𝐸
𝜕𝑂𝑆2
𝑊1𝑖,2 + … +
𝜕𝐸
𝜕𝑂𝑆𝑁2
𝑊1𝑖,𝑁2

𝜕𝐸
𝜕𝑊0𝑖,𝑗
,
𝜕𝐸
𝜕𝑊1𝑖,𝑗
,
𝜕𝐸
𝜕𝐵1𝑖
, and
𝜕𝐸
𝜕𝐵2𝑖
.

𝜕𝐸
𝜕𝑂𝑂
,
𝜕𝐸
𝜕𝑂𝑆
,
𝜕𝐸
𝜕𝐵2
,
𝜕𝐸
𝜕𝑊1
are done

𝜕𝐸
𝜕𝐻𝑂
= [
𝜕𝐸
𝜕𝐻𝑂1
,
𝜕𝐸
𝜕𝐻𝑂2
, … … ,
𝜕𝐸
𝜕𝐻𝑂𝑁1
] =
𝜕𝐸
𝜕𝑂𝑆
𝑊1𝑇

𝜕𝐸
𝜕𝑊0𝑖,𝑗
,
𝜕𝐸
𝜕𝑊1𝑖,𝑗
,
𝜕𝐸
𝜕𝐵1𝑖
, and
𝜕𝐸
𝜕𝐵2𝑖
.

𝜕𝐸
𝜕𝑂𝑂
,
𝜕𝐸
𝜕𝑂𝑆
,
𝜕𝐸
𝜕𝐵2
,
𝜕𝐸
𝜕𝑊1
,
𝜕𝐸
𝜕𝐻𝑂
are done
 Once
𝜕𝐸
𝜕𝐻𝑂
is computed, we can repeat the process for the hidden
layer by replacing OO with HO, OS with HS, B2 with B1 and W2 with
W1, in the differential equation. Also the input is IN[N0] and the
output is HO[N1].

Summary
 The output of a layer is the input of the next layer.
 Backward propagation uses results from forward propagation.
o
𝜕𝐸
𝜕𝐼𝑁
=
𝜕𝐸
𝜕𝑂
𝑊𝑇,
𝜕𝐸
𝜕𝑊
= 𝐼𝑁𝑇 𝜕𝐸
𝜕𝑂
,
𝜕𝐸
𝜕𝐵
=
𝜕𝐸
𝜕𝑌
layer
IN O Layer 1
X Layer 2 Layer 3
H1 H2
Y
layer
𝜕𝐸
𝜕𝐼𝑁
𝜕𝐸
𝜕𝑂

Training for the logic XOR and AND with a 6-
unit 2-level nueral network
 Logic XOR function is not a linear function (can’t train with
lect8/one.cpp). See 3level.cpp
Logic XOR (⨁) operation
𝒙𝟏 𝒙𝟐 𝒙𝟏⨁𝒙𝟐
0 0 0
0 1 1
1 0 1
1 1 0
(0, 0) (1, 0)
(1, 1)
(0, 1) AND
XOR

Summary
 Briefly discuss multi-level feedforward neural networks
 The training of neural networks
 Following 3level.cpp, one should be able to write a program for any
multi-level feedforward neural networks.

DNN.pptx

Recommended

Recommended

More Related Content

Similar to DNN.pptx

Similar to DNN.pptx (20)

Recently uploaded

Recently uploaded (20)

DNN.pptx