14. How does Neural Network learn?
Calculate error
Sum of squares loss
Softmax loss
Cross entropy loss
Hinge loss
15. How does Neural Network learn?
−
Sum of squares loss
Softmax loss
Cross entropy loss
Hinge loss
0.2
0.8
Sum of squares loss = 0.08
0.2
0.8
Output of ANN
0.0
1.0
Target value
= 0.04
0.04
( )
2
17. What we have to decide?
Gradient Descent Optimization Algorithms
• Batch Gradient Descent
• Stochastic Gradient Descent (SGD)
• Momentum
• Nesterov Accelerated Gradient (NAG)
• Adagrad
• RMSProp
• AdaDelta
• Adam
18. What we have to decide?
Neural network structure
• VGG-19
• GoogLeNet
Training techniques
• Drop out
• sparse
Loss function and cost function
• Cross entropy
• Sum of squeares
Optimization algorithm
• Adam
• SDG
19. Why it’s hard to decide a loss function?
In classification.
Input
NN
Output of NN Target
Output of NN
Calculate NN output Calculate loss
loss
NN
Update weights
of NN using loss
20. Why it’s hard to decide a loss function?
In classification.
Output of NN Target
0.67
0.00
0.02
0.12
0.04
0.00
0.03
0.14
1.0
0.00
0.00
0.00
0.00
0.00
0.00
0.00
Loss
Sum of L1 norm Cross entropy
0.68 2.45
21. 1. Forward Propagation
2. Back Propagation for Neural Nets
3. Back Propagation for Pooling
4. Back Propagation for Convolution
Back Propagation
Ⅰ
22. Problem
Problem: XOR
Data set:
data target
[0, 0] [1, 0]
[0, 1] [0, 1]
[1, 0] [0, 1]
[1, 1] [1, 0]
Layer structure: input layer hidden layer output layer
• Input layer
# of node: 2
• Hidden layer
# of node: 3
activation function: logistic sigmoid
• Output layer
# of node: 2
activation function: logistic sigmoid
cost function: sum of square
23. Notation
𝑣1
2 Index of layer
Index of neuron in a layer
Sum of its inputs
Before activation function
𝑦1
2 Index of layer
Index of neuron in a layer
Output of a Neuron
After activation function
𝑤2→1
2 Index of layer
from
Weight
to
45. practice
Data set: cifar 10
문제: 10개 class의 classification 문제
Input domain: 32X32 3channel image
접근방법: parametric model
Estimation 방법: MLE (frequentist’s way)
Adopted from torch homepage
Adopted from Wikipedia (Maximum entropy probability distribution)
52. practice
• 𝑝𝑖 =
𝑒 𝜂 𝑖
𝑖=1
𝑘
𝑒 𝜂 𝑖
• 𝜂𝑖 = l𝑛
𝑝 𝑖
1− 𝑗=1
𝑘−1 𝑝 𝑗
Natural parameter Inverse parameter mapping
• 𝑝𝑖
Parameter
Adopted from Wikipedia (exponential family)
Categorical distribution (if n=1 in multinomial distribution)
53. practice
Objective function: Maximize likelihood maximize log-likelihood
minimize negative log-likelihood
min
𝜃
− log 𝑃 𝑦 𝑥; 𝜂
− log 𝑃 𝑦 𝑥; 𝜂
= − log
𝑖=1
𝑚
𝑗=1
𝑘
𝑝𝑗
𝑦 𝑖,𝑗 , 𝑖𝑓 𝑤𝑒 ℎ𝑎𝑣𝑒 𝑚 𝑠𝑎𝑚𝑝𝑙𝑒𝑠 𝑎𝑛𝑑 𝑒𝑎𝑐ℎ 𝑠𝑎𝑚𝑝𝑙𝑒𝑠 𝑎𝑟𝑒 𝑖. 𝑖. 𝑑.
X: input data (feature of input data)-image
Y: label
𝜂: natural parameter
min
𝜃
𝑖=1
𝑚
𝑗=1
𝑘
−𝑦𝑖,𝑗 log 𝑝𝑗 cross entropy
Why we use natural parameter?
• 𝜂 is called the natural parameter. The set of values of 𝜂 for which the
function 𝑓𝑋 𝑥; 𝜃 is finite is called the natural parameter space. It can be
shown that the natural parameter space is always convex.
• And it is canonical parameter.
54. practice
Overall model
Input image
Extracting
feature from
input domain
(Conv)
Distribution
estimation
(Full connected)
(weights: 𝜼)
Loss function
(Cross entropy)
Output
layer
(softmax)
(𝜼->p)
Forward propagation step: we set the weight of fully connected
network as natural parameter of categorical distribution. So,
activation function of output layer could be a softmax function.
Because the loss function is cross entropy of P vector and label
vector and we assume that the weights of full connected layer is
natural parameter of multinomial distribution.
Back propagation step: updated by optimizer (SGD or Adam).
55. Schedule
다음 시간: 실제로 Cifar10 dataset으로 classifier 제작하기.
Prerequisite: pytorch 개발 환경. (anaconda 환경으로 설치 추천)