Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
deep learning from scratch chapter 4.neural network learing
1. Interaction Lab. Kumoh National Institute of Technology
Deep Learning from Scratch
chapter 4. Neural Network Learning
JaeYeop Jeong
2. ■Neural network learning
■Loss function
■Differential
■Gradient method
■Learning algorithm implement
Agenda
Interaction Lab., Kumoh National Institue of Technology 2
3. ■Data is important in machine learning.
■All problems can be solved in the same context.
5, dog, human face
Neural network learning(1/2)
Interaction Lab., Kumoh National Institue of Technology 3
Human thought algorithm
Neural network
(deep learning)
Human thought feature
(SIFT, HOG, …)
Machine
learning
(SVM, KNN, …)
Output
Output
Output
4. ■Divide into training data and test data.
First, use only training data to find optimal parameters.
And then, test training model.
universal model
■Overfitting
Neural network learning(2/2)
Interaction Lab., Kumoh National Institue of Technology 4
5. ■Use the loss function to find the optimal parameter.
Mean squared error, cross entropy error
■Mean squared error(MSE)
𝐸 =
1
2 𝑘(𝑦𝑘 − 𝑡𝑘)2
, (𝑦𝑘
= Neural network output, 𝑡𝑘 = lable, 𝑘 = dimesion of data)
have a small error
Loss function
Interaction Lab., Kumoh National Institue of Technology 5
y -> softmax()
0.0975
0.5975
6. ■Cross entropy error(CEE)
𝐸 = − 𝑘 𝑡𝑘𝑙𝑜𝑔𝑦𝑘, (𝑦𝑘 = Neural network output, 𝑡𝑘 = lable, 𝑘 = dimesion of data) (𝑙𝑜𝑔𝑒)
• 𝑡𝑘 is one-hot encoding
• Calculate natural logarithm when practically correct
𝑡𝑘 is ‘2’, 𝑦𝑘 = 0.6 − 𝑙𝑜𝑔0.6 = 0.51
𝑡𝑘 is ‘2’, 𝑦𝑘 = 0.1 − 𝑙𝑜𝑔0.1 = 2.30
That is CCE determines the full value of the output when correct
have a small error
Loss function
Interaction Lab., Kumoh National Institue of Technology 6
+delta
+delta
0.5108
2.3025
y = logx
7. ■Mini-Batch learning
Machine learning problems are taught using data
• Obtain loss function for training data and find optimal parameters
• That is, if you have 100 training data, use 100 loss function values.
• BigData,,,
𝐸 = −
1
𝑁 𝛱 𝑘 𝑡𝑛𝑘 log 𝑦𝑛𝑘 (𝑦𝑘 = Neural network output, 𝑡𝑘 = lable, 𝑘 = dimesion of data)
• Average loss function
■ regardless of the number of data
Use only 100 out of 60000 data
• Learning use only 100 data
Loss function
Interaction Lab., Kumoh National Institue of Technology 7
8. ■Mini-Batch implement
CCE implement
• t : one-hot encoding
• t : not one-hot encoding
Loss function
Interaction Lab., Kumoh National Institue of Technology 8
batch_size = 5 - > [0, 1, 2, 3, 4]
t - > [2, 7, 0, 9, 4]
[y[0, 2], y[1, 7], y[2, 0], y[3, 9], y[4,4]]
[[0, 1, 2, 3, 4]]
9. ■Why loss function? Why not accuracy?
To find parameter values that draw high 'accuracy’
Parameter values that make the loss function small
• Differential
■-
■ +
Loss function
Interaction Lab., Kumoh National Institue of Technology 9
Step
function
Sigmoid
function
11. ■Partial differential
𝑓 𝑥0, 𝑥1 = 𝑥0
2
+ 𝑥1
2
• x = 3.0
• x = 4.0
Differential
Interaction Lab., Kumoh National Institue of Technology 11
12. ■Gradient method
Loss function minimum value
• Use gradient
■ Not always correct but it’s hint.
Move a certain distance after calculation
Gradient method
Interaction Lab., Kumoh National Institue of Technology 12
Learning rate
(0.01, 0.001,,,)
초기값 : (-3.0, 4)