deep learning from scratch chapter 6.backpropagation

■Activation function
 Why use?
• Large hidden layer : complex function
• Small hidden layer : simple function
• Input node : 13, output node : 1
■ hidden layer 1 (Node 4 ) : 13 * 4 + 5 = 57
■ Hidden layer 2 (Node 2) : 13 * 2 + 2 * 2 + 3 = 35
Ch4_Feedback
Interaction Lab., Kumoh National Institue of Technology 2

■Sigmoid function
 ℎ 𝑥 =
1
1+𝑒−𝑥
 Smooth curve, continuous variation
 Return real-valued
 ℎ 1 = 0.731
■ReLU function
 ℎ(𝑥) =
𝑥 (𝑥 ≥ 0)
0 (𝑥 < 0)
 Leakly ReLU, PReLU
Ch4_Feedback

■Sigmoid function
 Gradient vanishing
• Backpropagation
Ch4_Feedback

■HOG(Histogram of Gradient)
 Use image’s local gradient as a feature of the image
Ch5_Feedback

■GD vs SGD
 Gradient Descent
• Compute all the data => 1 h
• Take the best step forward
• 6 step = 6 h
• Sure, but it is too slow
 Stochastic Gradient Descent
• Compute only some data => 5 m
• Take quickly step forward
• 10 step = 50 m
• It is a little lost, but it is going fast
Ch5_Feedback

■Optimizer
Ch5_Feedback

Interaction Lab. Kumoh National Institute of Technology
Deep Learning from Scratch
chapter 6. back propagation
JaeYeop Jeong

■Intro
■Computational graph
■Chain rule
■Back propagation
■Implementation of simple layer
■Implementation of activation function layer
■Implementation of Affine/softmax layer
Agenda

■Numerical differentials are simple and easy to implement
 Long time to calculate
■Back propagation
 To calculate the gradient of the weight efficiently
 A formula or Computational graph
Intro

■A graph of the calculation process
 Node, edge
■Q1
 현빈 군은 슈퍼에서 1개에 100원인 사과를 2개 샀습니다. 이때 지불
금액을 구하세요. 단 소비세가 10% 부과됩니다.
Computational graph(1/5)

■Q2
 현빈 군은 슈퍼에서 사과를 2개, 귤을 3개 샀습니다. 사과는 1개에 100
원, 귤은 1개 150원입니다. 소비세가 10%일 때 지불 금액을 구하세요.
 Construct the Computational graph
 Proceed from left to right with the calculation

■Local computation
 A small range directly related to oneself
4000 + 200 = 4200

■Why computational graph
 Local computation
 Keep all intermediate calculation results
 Calculate differentials efficiently
• Apple prices : 𝑥, Payment(𝐿) :
𝜕𝐿
𝜕𝑥

■Back propagation of computational graph
 Multiply the local differential in the forward and opposite directions
• 𝑦 = 𝑓 𝑥 = 𝑥2
,
𝜕𝑦
𝜕𝑥
= 2𝑥
Chain rule(1/3)
𝑓
𝑥 𝑦
𝐸
𝜕𝑦
𝜕𝑥
𝐸

■𝑧 = 𝑡2
, 𝑡 = 𝑥 + 𝑦
Chain rule(2/3)

Chain rule(3/3)
𝜕𝑧
𝜕𝑧
𝜕𝑧
𝜕𝑡
𝜕𝑡
𝜕𝑥
=
𝜕𝑧
𝜕𝑡
𝜕𝑡
𝜕𝑥
=
𝜕𝑧
𝜕𝑥

■Back propagation of add node
 𝑧 = 𝑥 + 𝑦,
𝜕𝑧
𝜕𝑥
= 1,
𝜕𝑧
𝜕𝑦
= 1
Back propagation(1/5)

■Back propagation of add node
 Add node : Send as it is

■Back propagation of multiply node
 𝑧 = 𝑥𝑦,
𝜕𝑧
𝜕𝑥
= 𝑦,
𝜕𝑧
𝜕𝑦
= 𝑥

■Back propagation of multiply node
 Multiply interchangeable values
• Input of forward propagation

■Example

■Multiply layer
Implementation of simple layer(1/3)

■Add layer

■ReLU layer
 𝑦 =
𝑥 ( 𝑥 > 0)
0 (𝑥 ≤ 0)
𝜕𝑦
𝜕𝑥
=
1 (𝑥 > 0)
0 (𝑥 ≤ 0)
Implementation of activation function layer
𝑟𝑒𝑙𝑢
𝑥 𝑦
𝜕𝐿
𝜕𝑦
𝜕𝐿
𝜕𝑦
𝑟𝑒𝑙𝑢
𝑥 𝑦
0 𝜕𝐿
𝜕𝑦
𝑥 > 0
𝑥 ≤ 0

■Sigmoid layer
 𝑦 =
1
1+exp(−𝑥)
 exp 𝑥 → 𝑦 = exp 𝑥
 / → 𝑦 =
1
𝑥

■Sigmoid layer
 𝑦 =
1
1+exp(−𝑥)
, (1 + exp −𝑥 = 𝑥) 𝑦 =
1
𝑥

■Sigmoid layer

■Affine layer
Implementation of Affine/softmax layer

■Batch affine layer

■Softmax-with-Loss layer
 Softmax layer
 Cross entropy error

 t : (0, 1, 0)
 y : (0.3, 0.2, 0.5) => y – t : (0.3, -0.8, 0.5)

Q&A

deep learning from scratch chapter 6.backpropagation

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to deep learning from scratch chapter 6.backpropagation

Similar to deep learning from scratch chapter 6.backpropagation (20)

Recently uploaded

Recently uploaded (20)

deep learning from scratch chapter 6.backpropagation