SlideShare a Scribd company logo
1 of 56
Confusion Matrix
Adopted from kwangil kim’s machine learning class material
Confusion Matrix
Adopted from kwangil kim’s machine learning class material
ROC (Receiver Operating Characteristic)
Adopted from kwangil kim’s machine learning class material
Parametric vs Non-parametric models
Adopted from kwangil kim’s machine learning class material
Example of Non-parametric models
Adopted from wikipedia
Central limit theorem
Adopted from wikipedia
Markov chain
Adopted from wikipedia
What is the Neural Network?
How does Neural Network learn?
Preparing input and target pairs.
inputs targets
Lion
Cat
map
0
1
1
0
0
1
One-hot
encoding
Dog 2
0
0
0
0
1
How does Neural Network learn?
The weights of the network are arbitrarily set.
0.6
0.2
0.3
0.9
0.1
How does Neural Network learn?
Feed Forward
How does Neural Network learn?
Feed Forward
0.2
0.1
0.6
0.3
0.2
0.7
0.3
0.1
𝑠𝑢𝑚: 0.2 × 0.2 + 0.1 × 0.7 + 0.6 × 0.3 + 0.3 × 0.1 = 0.32
N21
𝑂𝑢𝑡𝑝𝑢𝑡 𝑜𝑓 𝑁21 = 𝑓 0.32 𝑓 𝑖𝑠 𝑎𝑐𝑡𝑖𝑣𝑎𝑡𝑖𝑜𝑛 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛 𝑜𝑓 𝑁21
𝑂𝑢𝑡𝑝𝑢𝑡 𝑜𝑓 𝑁21 = 𝑓 0.32 = 0.1024. 𝑖𝑓 𝑓 𝑥 = 𝑥2
How does Neural Network learn?
Calculate error
Sum of squares loss
Softmax loss
Cross entropy loss
Hinge loss
How does Neural Network learn?
−
Sum of squares loss
Softmax loss
Cross entropy loss
Hinge loss
0.2
0.8
Sum of squares loss = 0.08
0.2
0.8
Output of ANN
0.0
1.0
Target value
= 0.04
0.04
( )
2
How does Neural Network learn?
Feedback
What we have to decide?
Gradient Descent Optimization Algorithms
• Batch Gradient Descent
• Stochastic Gradient Descent (SGD)
• Momentum
• Nesterov Accelerated Gradient (NAG)
• Adagrad
• RMSProp
• AdaDelta
• Adam
What we have to decide?
Neural network structure
• VGG-19
• GoogLeNet
Training techniques
• Drop out
• sparse
Loss function and cost function
• Cross entropy
• Sum of squeares
Optimization algorithm
• Adam
• SDG
Why it’s hard to decide a loss function?
In classification.
Input
NN
Output of NN Target
Output of NN
Calculate NN output Calculate loss
loss
NN
Update weights
of NN using loss
Why it’s hard to decide a loss function?
In classification.
Output of NN Target
0.67
0.00
0.02
0.12
0.04
0.00
0.03
0.14
1.0
0.00
0.00
0.00
0.00
0.00
0.00
0.00
Loss
Sum of L1 norm Cross entropy
0.68 2.45
1. Forward Propagation
2. Back Propagation for Neural Nets
3. Back Propagation for Pooling
4. Back Propagation for Convolution
Back Propagation
Ⅰ
Problem
Problem: XOR
Data set:
data  target
[0, 0]  [1, 0]
[0, 1]  [0, 1]
[1, 0]  [0, 1]
[1, 1]  [1, 0]
Layer structure: input layer  hidden layer  output layer
• Input layer
# of node: 2
• Hidden layer
# of node: 3
activation function: logistic sigmoid
• Output layer
# of node: 2
activation function: logistic sigmoid
cost function: sum of square
Notation
𝑣1
2 Index of layer
Index of neuron in a layer
Sum of its inputs
Before activation function
𝑦1
2 Index of layer
Index of neuron in a layer
Output of a Neuron
After activation function
𝑤2→1
2 Index of layer
from
Weight
to
Forward Propagation
𝑦1
0
𝑦2
0
𝑦0
0
= 1 𝑦0
1
= 1
𝑦1
1
𝑦2
1
𝑦3
1
𝑣1
1
𝑣2
1
𝑣3
1
𝑤1→2
1
𝑤1→3
1
𝑤1→1
1
𝜑(𝑣1
1
)
𝜑(𝑣2
1
)
𝜑(𝑣3
1
)
𝑤2→1
2
𝑤2→2
2
𝑣1
2
𝑣2
2
𝜑(𝑣1
2
)
𝜑(𝑣2
2
)
𝑦1
2
𝑦2
2
𝑡2
𝑡1
𝐸
Bias input
Legend
Input node
Output node
Hidden node
Cost function
Layer 0:
Input layer
Layer 1:
hidden layer
Layer 2:
output layer
Cost part
Forward Propagation
𝑦1
0
𝑦2
0
𝑦0
0
= 1 𝑦0
1
= 1
𝑦1
1
𝑦2
1
𝑦3
1
𝑣1
1
𝑣2
1
𝑣3
1
𝑤1→2
1
𝑤1→3
1
𝑤1→1
1
𝜑(𝑣1
1
)
𝜑(𝑣2
1
)
𝜑(𝑣3
1
)
𝑤2→1
2
𝑤2→2
2
𝑣1
2
𝑣2
2
𝜑(𝑣1
2
)
𝜑(𝑣2
2
)
𝑦1
2
𝑦2
2
𝑡2
𝑡1
𝐸
𝑣𝑗
𝑖
= 𝑘=0
𝑀 𝑖−1
𝑤 𝑘→𝑗
𝑖
× 𝑦 𝑘
𝑖−1
if 𝑀 𝑛 is # of nodes of n layer
𝑦𝑗
𝑖
= 𝜑(𝑣𝑗
𝑖
) 𝜑() is activation function
𝐸 = 𝑘=1
𝑀 𝐿
(𝑦 𝑘
𝐿
− 𝑡 𝑘)2
if L is index of output layer
Back Propagation for Neural Nets
𝑦0
1
= 1
𝑦1
1
𝑦2
1
𝑦3
1
𝑣1
1
𝑣2
1
𝑣3
1
𝜑(𝑣1
1
)
𝜑(𝑣2
1
)
𝜑(𝑣3
1
)
𝑤2→1
2
𝑤2→2
2
𝑣1
2
𝑣2
2
𝜑(𝑣1
2
)
𝜑(𝑣2
2
)
𝑦1
2
𝑦2
2
𝑒1
2
𝑒2
2
𝛿1
2
𝛿1
2 𝐸
𝛿1
1
𝛿2
1
𝛿3
1
𝑡1
𝑡2
𝑒 𝑘
𝐿
= 𝑦𝑖
𝐿
- 𝑡𝑖
𝐸 =
𝑘=1
𝑀 𝐿
(𝑒 𝑘
𝐿
)2
𝐸 = (𝑒1
2
)2
+ (𝑒2
2
)2
Total error in this example
Back Propagation for Neural Nets
Minimize 𝐸 = (𝑒1
2
)2
+ (𝑒2
2
)2
Gradient
-Gradient
-Gradient
−
𝜕𝐸
𝜕𝑒2
−
𝜕𝐸
𝜕𝑒1
Back Propagation for Neural Nets
Back Propagation for Neural Nets
𝑦0
1
= 1
𝑦1
1
𝑦2
1
𝑦3
1
𝑣1
1
𝑣2
1
𝑣3
1
𝜑(𝑣1
1
)
𝜑(𝑣2
1
)
𝜑(𝑣3
1
)
𝑤2→1
2
𝑤2→2
2
𝑣1
2
𝑣2
2
𝜑(𝑣1
2
)
𝜑(𝑣2
2
)
𝑦1
2
𝑦2
2
𝑒1
2
𝑒2
2
𝛿1
2
𝛿1
2 𝐸
𝛿1
1
𝛿2
1
𝛿3
1
𝑡1
𝑡2
∆𝑤2→1
2
= −
𝜕𝐸
𝜕𝑒1
2
𝜕𝑒1
2
𝜕𝑦1
2
𝜕𝑦1
2
𝜕𝑣1
2
𝜕𝑣1
2
𝜕𝑤2→1
2 𝜂 ∆𝑤 𝑘→𝑗
𝐿
= −
𝜕𝐸
𝜕𝑒 𝑗
𝐿
𝜕𝑒𝑗
𝐿
𝜕𝑦𝑗
𝐿
𝜕𝑦𝑗
𝐿
𝜕𝑣 𝑗
𝐿
𝜕𝑣 𝑗
𝐿
𝜕𝑤 𝑘→𝑗
𝐿 𝜂
𝛿1
2
𝛿𝑗
𝐿
𝑙𝑜𝑐𝑎𝑙 𝑔𝑟𝑎𝑑𝑖𝑒𝑛𝑡
𝜂 ∶ 𝑙𝑒𝑎𝑟𝑛𝑖𝑛𝑔 𝑟𝑎𝑡𝑒
Back Propagation for Neural Nets
𝜕𝐸
𝜕𝑒 𝑗
𝐿 = 2𝑒𝑗
𝐿
∵ 𝐸 = 𝑘=1
𝑀 𝐿
(𝑒 𝑘
𝐿
)2
𝜕𝑒 𝑗
𝐿
𝜕𝑦𝑗
𝐿 = 1 ∵ 𝑦𝑗
𝐿
= 𝑦𝑗
𝐿
− 𝑡 𝑗
∆𝑤 𝑘→𝑗
𝐿
= −
𝜕𝐸
𝜕𝑒 𝑗
𝐿
𝜕𝑒 𝑗
𝐿
𝜕𝑦𝑗
𝐿
𝜕𝑦𝑗
𝐿
𝜕𝑣 𝑗
𝐿
𝜕𝑣 𝑗
𝐿
𝜕𝑤 𝑘→𝑗
𝐿 𝜂
𝛿𝑗
𝐿
𝑙𝑜𝑐𝑎𝑙 𝑔𝑟𝑎𝑑𝑖𝑒𝑛𝑡
𝜕𝑦𝑗
𝐿
𝜕𝑣 𝑗
𝐿 = 𝜑′
𝑣𝑗
𝐿
∵ 𝑦𝑗
𝐿
= 𝜑(𝑣𝑗
𝐿
)
𝜕𝑣 𝑗
𝐿
𝜕𝑤 𝑘→𝑗
𝐿 = 𝑦 𝑘
𝐿−1
∵ 𝑣𝑗
𝐿
= 𝑙=0
𝑀 𝐿−1
𝑤𝑙→𝑗
𝐿
× 𝑦𝑙
𝐿−1
∆𝑤 𝑘→𝑗
𝐿
= 𝛿𝑗
𝐿
𝑦 𝑘
𝐿−1
𝜂 = −2𝑒𝑗
𝐿
𝜑′
𝑣𝑗
𝐿
𝑦 𝑘
𝐿−1
𝜂
𝛿𝑗
𝐿
= −2𝑒𝑗
𝐿
𝜑′
𝑣𝑗
𝐿
𝜑′ 𝑣𝑗
𝐿
= 𝜑 𝑣𝑗
𝐿
(1 − 𝜑 𝑣𝑗
𝐿
) ∵ 𝜑 𝑣𝑗
𝐿
=
1
1+𝑒
−𝑣 𝑗
𝐿
Back Propagation for Neural Nets
𝑦0
1
= 1
𝑦1
1
𝑦2
1
𝑦3
1
𝑣1
1
𝑣2
1
𝑣3
1
𝜑(𝑣1
1
)
𝜑(𝑣2
1
)
𝜑(𝑣3
1
)
𝑤2→1
2
𝑤2→2
2
𝑣1
2
𝑣2
2
𝜑(𝑣1
2
)
𝜑(𝑣2
2
)
𝑦1
2
𝑦2
2
𝑒1
2
𝑒2
2
𝛿1
2
𝛿1
2 𝐸
𝛿1
1
𝛿2
1
𝛿3
1
𝑡1
𝑡2
∆𝑤2→1
2
= 𝛿1
2
𝑦2
1
𝜂 = −2𝑒1
2
𝜑′
𝑣1
2
𝑦2
1
𝜂
𝛿1
2
= −2𝑒1
2
𝜑′ 𝑣1
2
Back Propagation for Neural Nets
𝑦0
1
= 1
𝑦1
1
𝑦2
1
𝑦3
1
𝑣1
1
𝑣2
1
𝑣3
1
𝜑(𝑣1
1
)
𝜑(𝑣2
1
)
𝜑(𝑣3
1
)
𝑤2→1
2
𝑤2→2
2
𝑣1
2
𝑣2
2
𝜑(𝑣1
2
)
𝜑(𝑣2
2
)
𝑦1
2
𝑦2
2
𝑒1
2
𝑒2
2
𝛿1
2
𝛿1
2 𝐸
𝛿1
1
𝛿2
1
𝛿3
1
𝑡1
𝑡2
𝛿2
1
= −
𝜕𝐸
𝜕𝑣2
1 = −
𝜕 𝑘=1
2
(𝑒 𝑘
2
)2
𝜕𝑣2
1 = −
𝑘=1
2
𝜕(𝑒 𝑘
2
)2
𝜕𝑣2
1 =
𝑘=1
2
−
𝜕(𝑒 𝑘
2
)2
𝜕𝑒 𝑘
2
𝜕𝑒 𝑘
2
𝜕𝑣2
1
=
𝑘=1
2
−
𝜕(𝑒 𝑘
2
)2
𝜕𝑒 𝑘
2
𝜕𝑒 𝑘
2
𝜕𝑦 𝑘
2
𝜕𝑦 𝑘
2
𝜕𝑣 𝑘
2
𝜕𝑣 𝑘
2
𝜕𝑦2
1
𝜕𝑦2
1
𝜕𝑣2
1 =
𝑘=1
2
𝛿 𝑘
2 𝜕𝑣 𝑘
2
𝜕𝑦2
1
𝜕𝑦2
1
𝜕𝑣2
1 =
𝑘=1
2
𝛿 𝑘
2 𝜕𝑣 𝑘
2
𝜕𝑦2
1 𝜑′(𝑣2
1
)
= 𝜑′ 𝑣2
1
𝑘=1
2
𝛿 𝑘
2
𝑤2→𝑘
2
∵ 𝑣 𝑘
2
=
𝑚=0
3
𝑦 𝑚
1 𝑤 𝑚→𝑘
2
Back Propagation for Neural Nets
𝑦0
1
= 1
𝑦1
1
𝑦2
1
𝑦3
1
𝑣1
1
𝑣2
1
𝑣3
1
𝜑(𝑣1
1
)
𝜑(𝑣2
1
)
𝜑(𝑣3
1
)
𝑤2→1
2
𝑤2→2
2
𝑣1
2
𝑣2
2
𝜑(𝑣1
2
)
𝜑(𝑣2
2
)
𝑦1
2
𝑦2
2
𝑒1
2
𝑒2
2
𝛿1
2
𝛿1
2 𝐸
𝑡1
𝑡2
𝛿 𝑚
𝑖 = − 𝜑′
𝑣 𝑚
𝑖
𝑘=1
𝑀 𝑛
𝛿 𝑘
𝑖+1
𝑤 𝑚→𝑘
𝑖+1
𝑦1
0
𝑦2
0
𝑦0
0
= 1
𝑤1→2
1
𝑤1→3
1
𝑤1→1
1
𝛿1
1
𝛿2
1
𝛿3
1
∆𝑤𝑗→𝑚
𝑖
= 𝛿 𝑚
𝑖
𝑦𝑗
𝑖−1
𝜂
Back Propagation for Neural Nets
𝛿
𝛿11
𝛿12
𝑣
𝑦11
𝑦12
𝛿
𝛿
𝑣 = 𝑦11 + 𝑦12
𝑑𝑣
𝑑𝑦11
=
𝑑𝑣
𝑑𝑦12
= 1 𝛿11 = 𝛿12 = 𝛿
𝑣𝑦11 𝑤11 𝛿𝛿 𝑦 𝛿 𝑤
𝑣 = 𝑦11 𝑤11
𝑑𝑣
𝑑𝑦11
= 𝑤11
𝑑𝑣
𝑑𝑤11
= 𝑦11
𝛿 𝑦 = 𝑤11 𝛿
𝛿 𝑤 = 𝑦11 𝛿
Back Propagation for Pooling
88 92
81 96
96
88 92
84 96
90
Max Pooling Average Pooling
𝛿
𝛿11 = 0
𝛿12 = 0
𝛿21 = 0
𝛿22 = 𝛿
𝛿
𝛿11 =
1
4
𝛿
𝛿12 =
1
4
𝛿
𝛿21 =
1
4
𝛿
𝛿22 =
1
4
𝛿
Back Propagation for Convolution
𝑦11 𝑦12
𝑦21 𝑦22
𝑤11 𝑤12
𝑤21 𝑤22
𝑥11 𝑥12
𝑥21 𝑥22
𝑥13
𝑥23
𝑥31 𝑥32 𝑥33
∗ =
𝑥11
𝑥12
𝑥13
𝑥21
𝑥22
𝑥23
𝑥31
𝑥32
𝑥33
𝑦11
𝑦12
𝑦21
𝑦22
𝛿1
𝛿2
𝛿3
𝛿4
𝛿11
𝛿12
𝛿13
𝛿21
𝛿22
𝛿23
𝛿31
𝛿32
𝛿33
𝛿11= 𝑤11 𝛿1
𝛿22= 𝑤21 𝛿1+ 𝑤21 𝛿2+ 𝑤21 𝛿3 + 𝑤21 𝛿4
∆𝑤11= η(𝑥11 𝛿1+ 𝑥12 𝛿2+ 𝑥21 𝛿3 + 𝑥22 𝛿4)
∆𝑤21= η(𝑥21 𝛿1+ 𝑥22 𝛿2+ 𝑥31 𝛿3 + 𝑥32 𝛿4)
Linear activation function
Input layer Hidden layer 1 Hidden layer 2 Hidden layer 3 Output layer
𝑶𝒖𝒕𝒑𝒖𝒕 = 𝑰𝒏𝒑𝒖𝒕 × 𝑾𝟏 × 𝑾𝟐 × 𝑾𝟑 = 𝑰𝒏𝒑𝒖𝒕 × 𝑾
𝑾𝟏 𝑾𝟐 𝑾𝟑𝑰𝒏𝒑𝒖𝒕 𝑶𝒖𝒕𝒑𝒖𝒕
Input layer Hidden layer Output layer
𝑾𝑰𝒏𝒑𝒖𝒕 𝑶𝒖𝒕𝒑𝒖𝒕
Convolution
Adopted from Wikipedia
Convolution
Adopted from https://pgaleone.eu/neural-
networks/2016/11/24/convolutional-autoencoders/
Adopted from apple developer documentation archive
Convolution
Adopted from apple developer documentation archive
Convolution
• Filter size
• Stride
• Padding
• Dilation
• Transposed convolution (Deconv.)
No padding, no stride No padding, stride
same padding, no stride
padding, stride
arbitrary padding, no stride
Convolution
Padding, strides, transposed No padding, strides, transposed
full padding, no strides, transposed
no padding, no strides, transposed
Convolution
no padding, no strides, dilation
All convolution animations are adopted from
https://github.com/vdumoulin/conv_arithmetic
pooling
adopted from https://stackoverflow.com/questions/44287965/trying-to-confirm-
average-pooling-is-equal-to-dropping-high-frequency-fourier-co
practice
Data set: cifar 10
문제: 10개 class의 classification 문제
Input domain: 32X32 3channel image
접근방법: parametric model
Estimation 방법: MLE (frequentist’s way)
Adopted from torch homepage
Adopted from Wikipedia (Maximum entropy probability distribution)
practice
Maximum entropy function: Multinomial distribution.
N = 1 for 1 sample, k = number of classes.
practice
PMF:
𝑛!
𝑥1!⋯𝑥 𝑘!
𝑝1
𝑥1 ⋯ 𝑝 𝑘
𝑥 𝑘
𝑝1
𝑥1 ⋯ 𝑝 𝑘
𝑥 𝑘
𝑖𝑓 𝑛 𝑖𝑠 1
𝑥1, ⋯ , 𝑥 𝑘 = 0, 1, ⋯ , 0
𝑝1, ⋯ , 𝑝 𝑘 = 0.1, 0.3, ⋯ , 0.02
practice
𝑛!
𝑗=1
𝑘
𝑥𝑗!
𝑗=1
𝑘
𝑝𝑗
𝑥 𝑗 =
𝑛!
𝑗=1
𝑘
𝑥𝑗!
exp
𝑗=1
𝑘
𝑥𝑗 l𝑛 𝑝𝑗
=
𝑛!
𝑗=1
𝑘
𝑥𝑗!
exp
𝑗=1
𝑘−1
𝑥𝑗 l𝑛 𝑝𝑗 + (𝑛 −
𝑗=1
𝑘−1
𝑥𝑗) l𝑛(1 −
𝑗=1
𝑘−1
𝑝𝑗)
=
𝑛!
𝑗=1
𝑘
𝑥𝑗!
exp
𝑗=1
𝑘−1
𝑥𝑗 l𝑛 𝑝𝑗 −
𝑗=1
𝑘−1
𝑥𝑗 l𝑛 1 −
𝑗=1
𝑘−1
𝑝𝑗 + 𝑛 l𝑛(1 −
𝑗=1
𝑘−1
𝑝𝑗))
=
𝑛!
𝑗=1
𝑘
𝑥𝑗!
exp
𝑗=1
𝑘−1
𝑥𝑗 l𝑛
𝑝𝑗
1 − 𝑗=1
𝑘−1
𝑝𝑗
+ 𝑛 l𝑛(1 −
𝑗=1
𝑘−1
𝑝𝑗))
practice
𝑛!
𝑗=1
𝑘
𝑥𝑗!
exp
𝑗=1
𝑘−1
𝑥𝑗 l𝑛
𝑝𝑗
1 − 𝑗=1
𝑘−1
𝑝𝑗
+ 𝑛 l𝑛(1 −
𝑗=1
𝑘−1
𝑝𝑗))
𝑓𝑥 𝑥 𝜃 = ℎ 𝑥 exp(𝜂 𝜃 ∙ 𝑇 𝑥 − 𝐴 𝜃 )
• ℎ 𝒙 =
𝑛!
𝑗=1
𝑘 𝑥 𝑗!
• 𝜂 𝜃 = [l𝑛
𝑝1
1− 𝑗=1
𝑘−1
𝑝 𝑗
, ⋯ , l𝑛
𝑝 𝑘−1
1− 𝑗=1
𝑘−1
𝑝 𝑗
, 0]
• 𝑇 𝒙 = [𝑥1, ⋯ , 𝑥 𝑘]
• 𝐴 𝜃 = −𝑛 l𝑛(1 − 𝑗=1
𝑘−1
𝑝𝑗))
practice
Adopted from Wikipedia (exponential family)
• 𝜂 𝜃 = [l𝑛
𝑝1
1− 𝑗=1
𝑘−1
𝑝 𝑗
, ⋯ , l𝑛
𝑝 𝑘−1
1− 𝑗=1
𝑘−1
𝑝 𝑗
, 0]
•
𝜂𝑖 = l𝑛
𝑝 𝑖
1− 𝑗=1
𝑘−1 𝑝 𝑗
, 𝑖𝑓 𝑖 < 𝑘
0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
practice
𝑒 𝜂 𝑖 =
𝑝𝑖
1 − 𝑗=1
𝑘−1
𝑝𝑗
⟹
𝑖=1
𝑘
𝑒 𝜂 𝑖 =
𝑖=1
𝑘
𝑝𝑖
1 − 𝑗=1
𝑘−1
𝑝𝑗
=
1
1 − 𝑗=1
𝑘−1
𝑝𝑗 𝑖=1
𝑘
𝑝𝑖 =
1
1 − 𝑗=1
𝑘−1
𝑝𝑗
⟹
1 −
𝑗=1
𝑘−1
𝑝𝑗 =
1
𝑖=1
𝑘
𝑒 𝜂 𝑖
⟹
𝑒 𝜂 𝑖 =
𝑝𝑖
1
𝑖=1
𝑘
𝑒 𝜂 𝑖
⟹
𝑝𝑖 =
𝑒 𝜂 𝑖
𝑖=1
𝑘
𝑒 𝜂 𝑖
practice
• 𝑝𝑖 =
𝑒 𝜂 𝑖
𝑖=1
𝑘
𝑒 𝜂 𝑖
• 𝜂𝑖 = l𝑛
𝑝 𝑖
1− 𝑗=1
𝑘−1 𝑝 𝑗
Natural parameter Inverse parameter mapping
• 𝑝𝑖
Parameter
Adopted from Wikipedia (exponential family)
Categorical distribution (if n=1 in multinomial distribution)
practice
Objective function: Maximize likelihood  maximize log-likelihood
 minimize negative log-likelihood
min
𝜃
− log 𝑃 𝑦 𝑥; 𝜂
− log 𝑃 𝑦 𝑥; 𝜂
= − log
𝑖=1
𝑚
𝑗=1
𝑘
𝑝𝑗
𝑦 𝑖,𝑗 , 𝑖𝑓 𝑤𝑒 ℎ𝑎𝑣𝑒 𝑚 𝑠𝑎𝑚𝑝𝑙𝑒𝑠 𝑎𝑛𝑑 𝑒𝑎𝑐ℎ 𝑠𝑎𝑚𝑝𝑙𝑒𝑠 𝑎𝑟𝑒 𝑖. 𝑖. 𝑑.
X: input data (feature of input data)-image
Y: label
𝜂: natural parameter
min
𝜃
𝑖=1
𝑚
𝑗=1
𝑘
−𝑦𝑖,𝑗 log 𝑝𝑗  cross entropy
Why we use natural parameter?
• 𝜂 is called the natural parameter. The set of values of 𝜂 for which the
function 𝑓𝑋 𝑥; 𝜃 is finite is called the natural parameter space. It can be
shown that the natural parameter space is always convex.
• And it is canonical parameter.
practice
Overall model
Input image
Extracting
feature from
input domain
(Conv)
Distribution
estimation
(Full connected)
(weights: 𝜼)
Loss function
(Cross entropy)
Output
layer
(softmax)
(𝜼->p)
Forward propagation step: we set the weight of fully connected
network as natural parameter of categorical distribution. So,
activation function of output layer could be a softmax function.
Because the loss function is cross entropy of P vector and label
vector and we assume that the weights of full connected layer is
natural parameter of multinomial distribution.
Back propagation step: updated by optimizer (SGD or Adam).
Schedule
다음 시간: 실제로 Cifar10 dataset으로 classifier 제작하기.
Prerequisite: pytorch 개발 환경. (anaconda 환경으로 설치 추천)
Deep learning study 2

More Related Content

What's hot

Deep Learning and Design Thinking
Deep Learning and Design ThinkingDeep Learning and Design Thinking
Deep Learning and Design ThinkingYen-lung Tsai
 
Generative Adversarial Networks (GAN)
Generative Adversarial Networks (GAN)Generative Adversarial Networks (GAN)
Generative Adversarial Networks (GAN)Manohar Mukku
 
GAN - Theory and Applications
GAN - Theory and ApplicationsGAN - Theory and Applications
GAN - Theory and ApplicationsEmanuele Ghelfi
 
Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017
Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017
Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017MLconf
 
Graph convolutional networks in apache spark
Graph convolutional networks in apache sparkGraph convolutional networks in apache spark
Graph convolutional networks in apache sparkEmiliano Martinez Sanchez
 
Erik Bernhardsson, CTO, Better Mortgage
Erik Bernhardsson, CTO, Better MortgageErik Bernhardsson, CTO, Better Mortgage
Erik Bernhardsson, CTO, Better MortgageMLconf
 
Deep Residual Hashing Neural Network for Image Retrieval
Deep Residual Hashing Neural Network for Image RetrievalDeep Residual Hashing Neural Network for Image Retrieval
Deep Residual Hashing Neural Network for Image RetrievalEdwin Efraín Jiménez Lepe
 
方策勾配型強化学習の基礎と応用
方策勾配型強化学習の基礎と応用方策勾配型強化学習の基礎と応用
方策勾配型強化学習の基礎と応用Ryo Iwaki
 
Deep neural networks & computational graphs
Deep neural networks & computational graphsDeep neural networks & computational graphs
Deep neural networks & computational graphsRevanth Kumar
 
EuroSciPy 2019 - GANs: Theory and Applications
EuroSciPy 2019 - GANs: Theory and ApplicationsEuroSciPy 2019 - GANs: Theory and Applications
EuroSciPy 2019 - GANs: Theory and ApplicationsEmanuele Ghelfi
 
Deep learning algorithms
Deep learning algorithmsDeep learning algorithms
Deep learning algorithmsRevanth Kumar
 
Backpropagation: Understanding How to Update ANNs Weights Step-by-Step
Backpropagation: Understanding How to Update ANNs Weights Step-by-StepBackpropagation: Understanding How to Update ANNs Weights Step-by-Step
Backpropagation: Understanding How to Update ANNs Weights Step-by-StepAhmed Gad
 
ModuLab DLC-Medical3
ModuLab DLC-Medical3ModuLab DLC-Medical3
ModuLab DLC-Medical3Dongheon Lee
 
Introduction to Generative Adversarial Networks
Introduction to Generative Adversarial NetworksIntroduction to Generative Adversarial Networks
Introduction to Generative Adversarial NetworksBennoG1
 
Artificial neural network
Artificial neural networkArtificial neural network
Artificial neural networkIldar Nurgaliev
 
Unsupervised learning represenation with DCGAN
Unsupervised learning represenation with DCGANUnsupervised learning represenation with DCGAN
Unsupervised learning represenation with DCGANShyam Krishna Khadka
 
An introduction to Deep Learning with Apache MXNet (November 2017)
An introduction to Deep Learning with Apache MXNet (November 2017)An introduction to Deep Learning with Apache MXNet (November 2017)
An introduction to Deep Learning with Apache MXNet (November 2017)Julien SIMON
 
Mastering the game of Go with deep neural networks and tree search (article o...
Mastering the game of Go with deep neural networks and tree search (article o...Mastering the game of Go with deep neural networks and tree search (article o...
Mastering the game of Go with deep neural networks and tree search (article o...Ilya Kuzovkin
 
Sliced Wasserstein距離と生成モデル
Sliced Wasserstein距離と生成モデルSliced Wasserstein距離と生成モデル
Sliced Wasserstein距離と生成モデルohken
 
On Certain Classess of Multivalent Functions
On Certain Classess of Multivalent Functions On Certain Classess of Multivalent Functions
On Certain Classess of Multivalent Functions iosrjce
 

What's hot (20)

Deep Learning and Design Thinking
Deep Learning and Design ThinkingDeep Learning and Design Thinking
Deep Learning and Design Thinking
 
Generative Adversarial Networks (GAN)
Generative Adversarial Networks (GAN)Generative Adversarial Networks (GAN)
Generative Adversarial Networks (GAN)
 
GAN - Theory and Applications
GAN - Theory and ApplicationsGAN - Theory and Applications
GAN - Theory and Applications
 
Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017
Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017
Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017
 
Graph convolutional networks in apache spark
Graph convolutional networks in apache sparkGraph convolutional networks in apache spark
Graph convolutional networks in apache spark
 
Erik Bernhardsson, CTO, Better Mortgage
Erik Bernhardsson, CTO, Better MortgageErik Bernhardsson, CTO, Better Mortgage
Erik Bernhardsson, CTO, Better Mortgage
 
Deep Residual Hashing Neural Network for Image Retrieval
Deep Residual Hashing Neural Network for Image RetrievalDeep Residual Hashing Neural Network for Image Retrieval
Deep Residual Hashing Neural Network for Image Retrieval
 
方策勾配型強化学習の基礎と応用
方策勾配型強化学習の基礎と応用方策勾配型強化学習の基礎と応用
方策勾配型強化学習の基礎と応用
 
Deep neural networks & computational graphs
Deep neural networks & computational graphsDeep neural networks & computational graphs
Deep neural networks & computational graphs
 
EuroSciPy 2019 - GANs: Theory and Applications
EuroSciPy 2019 - GANs: Theory and ApplicationsEuroSciPy 2019 - GANs: Theory and Applications
EuroSciPy 2019 - GANs: Theory and Applications
 
Deep learning algorithms
Deep learning algorithmsDeep learning algorithms
Deep learning algorithms
 
Backpropagation: Understanding How to Update ANNs Weights Step-by-Step
Backpropagation: Understanding How to Update ANNs Weights Step-by-StepBackpropagation: Understanding How to Update ANNs Weights Step-by-Step
Backpropagation: Understanding How to Update ANNs Weights Step-by-Step
 
ModuLab DLC-Medical3
ModuLab DLC-Medical3ModuLab DLC-Medical3
ModuLab DLC-Medical3
 
Introduction to Generative Adversarial Networks
Introduction to Generative Adversarial NetworksIntroduction to Generative Adversarial Networks
Introduction to Generative Adversarial Networks
 
Artificial neural network
Artificial neural networkArtificial neural network
Artificial neural network
 
Unsupervised learning represenation with DCGAN
Unsupervised learning represenation with DCGANUnsupervised learning represenation with DCGAN
Unsupervised learning represenation with DCGAN
 
An introduction to Deep Learning with Apache MXNet (November 2017)
An introduction to Deep Learning with Apache MXNet (November 2017)An introduction to Deep Learning with Apache MXNet (November 2017)
An introduction to Deep Learning with Apache MXNet (November 2017)
 
Mastering the game of Go with deep neural networks and tree search (article o...
Mastering the game of Go with deep neural networks and tree search (article o...Mastering the game of Go with deep neural networks and tree search (article o...
Mastering the game of Go with deep neural networks and tree search (article o...
 
Sliced Wasserstein距離と生成モデル
Sliced Wasserstein距離と生成モデルSliced Wasserstein距離と生成モデル
Sliced Wasserstein距離と生成モデル
 
On Certain Classess of Multivalent Functions
On Certain Classess of Multivalent Functions On Certain Classess of Multivalent Functions
On Certain Classess of Multivalent Functions
 

Similar to Deep learning study 2

Lecture 5 backpropagation
Lecture 5 backpropagationLecture 5 backpropagation
Lecture 5 backpropagationParveenMalik18
 
Deep learning simplified
Deep learning simplifiedDeep learning simplified
Deep learning simplifiedLovelyn Rose
 
Neural Networks. Overview
Neural Networks. OverviewNeural Networks. Overview
Neural Networks. OverviewOleksandr Baiev
 
Introduction to Neural Networks and Deep Learning
Introduction to Neural Networks and Deep LearningIntroduction to Neural Networks and Deep Learning
Introduction to Neural Networks and Deep LearningVahid Mirjalili
 
Lesson_8_DeepLearning.pdf
Lesson_8_DeepLearning.pdfLesson_8_DeepLearning.pdf
Lesson_8_DeepLearning.pdfssuser7f0b19
 
Neural network basic and introduction of Deep learning
Neural network basic and introduction of Deep learningNeural network basic and introduction of Deep learning
Neural network basic and introduction of Deep learningTapas Majumdar
 
Introduction to Neural Networks and Deep Learning from Scratch
Introduction to Neural Networks and Deep Learning from ScratchIntroduction to Neural Networks and Deep Learning from Scratch
Introduction to Neural Networks and Deep Learning from ScratchAhmed BESBES
 
Detailed Description on Cross Entropy Loss Function
Detailed Description on Cross Entropy Loss FunctionDetailed Description on Cross Entropy Loss Function
Detailed Description on Cross Entropy Loss Function범준 김
 
Clara_de_Paiva_Master_thesis_presentation_June2016
Clara_de_Paiva_Master_thesis_presentation_June2016Clara_de_Paiva_Master_thesis_presentation_June2016
Clara_de_Paiva_Master_thesis_presentation_June2016Clara de Paiva
 
Abductive commonsense reasoning
Abductive commonsense reasoningAbductive commonsense reasoning
Abductive commonsense reasoningSan Kim
 
Paper Study: OptNet: Differentiable Optimization as a Layer in Neural Networks
Paper Study: OptNet: Differentiable Optimization as a Layer in Neural NetworksPaper Study: OptNet: Differentiable Optimization as a Layer in Neural Networks
Paper Study: OptNet: Differentiable Optimization as a Layer in Neural NetworksChenYiHuang5
 
Paper study: Learning to solve circuit sat
Paper study: Learning to solve circuit satPaper study: Learning to solve circuit sat
Paper study: Learning to solve circuit satChenYiHuang5
 
Digit recognizer by convolutional neural network
Digit recognizer by convolutional neural networkDigit recognizer by convolutional neural network
Digit recognizer by convolutional neural networkDing Li
 
Artificial neural networks
Artificial neural networksArtificial neural networks
Artificial neural networksarjitkantgupta
 
Paper Study: Transformer dissection
Paper Study: Transformer dissectionPaper Study: Transformer dissection
Paper Study: Transformer dissectionChenYiHuang5
 

Similar to Deep learning study 2 (20)

Lecture 5 backpropagation
Lecture 5 backpropagationLecture 5 backpropagation
Lecture 5 backpropagation
 
Neural Networks
Neural NetworksNeural Networks
Neural Networks
 
Deep learning simplified
Deep learning simplifiedDeep learning simplified
Deep learning simplified
 
DNN.pptx
DNN.pptxDNN.pptx
DNN.pptx
 
Neural Networks. Overview
Neural Networks. OverviewNeural Networks. Overview
Neural Networks. Overview
 
Introduction to Neural Networks and Deep Learning
Introduction to Neural Networks and Deep LearningIntroduction to Neural Networks and Deep Learning
Introduction to Neural Networks and Deep Learning
 
04 Multi-layer Feedforward Networks
04 Multi-layer Feedforward Networks04 Multi-layer Feedforward Networks
04 Multi-layer Feedforward Networks
 
Lec10.pptx
Lec10.pptxLec10.pptx
Lec10.pptx
 
Lesson_8_DeepLearning.pdf
Lesson_8_DeepLearning.pdfLesson_8_DeepLearning.pdf
Lesson_8_DeepLearning.pdf
 
Neural network basic and introduction of Deep learning
Neural network basic and introduction of Deep learningNeural network basic and introduction of Deep learning
Neural network basic and introduction of Deep learning
 
Introduction to Neural Networks and Deep Learning from Scratch
Introduction to Neural Networks and Deep Learning from ScratchIntroduction to Neural Networks and Deep Learning from Scratch
Introduction to Neural Networks and Deep Learning from Scratch
 
Detailed Description on Cross Entropy Loss Function
Detailed Description on Cross Entropy Loss FunctionDetailed Description on Cross Entropy Loss Function
Detailed Description on Cross Entropy Loss Function
 
Clara_de_Paiva_Master_thesis_presentation_June2016
Clara_de_Paiva_Master_thesis_presentation_June2016Clara_de_Paiva_Master_thesis_presentation_June2016
Clara_de_Paiva_Master_thesis_presentation_June2016
 
Abductive commonsense reasoning
Abductive commonsense reasoningAbductive commonsense reasoning
Abductive commonsense reasoning
 
Paper Study: OptNet: Differentiable Optimization as a Layer in Neural Networks
Paper Study: OptNet: Differentiable Optimization as a Layer in Neural NetworksPaper Study: OptNet: Differentiable Optimization as a Layer in Neural Networks
Paper Study: OptNet: Differentiable Optimization as a Layer in Neural Networks
 
Paper study: Learning to solve circuit sat
Paper study: Learning to solve circuit satPaper study: Learning to solve circuit sat
Paper study: Learning to solve circuit sat
 
Digit recognizer by convolutional neural network
Digit recognizer by convolutional neural networkDigit recognizer by convolutional neural network
Digit recognizer by convolutional neural network
 
Artificial neural networks
Artificial neural networksArtificial neural networks
Artificial neural networks
 
Paper Study: Transformer dissection
Paper Study: Transformer dissectionPaper Study: Transformer dissection
Paper Study: Transformer dissection
 
feedforward-network-
feedforward-network-feedforward-network-
feedforward-network-
 

More from San Kim

20230419-LLaMA-Adapter_ Efficient Fine-tuning of Language Models with Zero-in...
20230419-LLaMA-Adapter_ Efficient Fine-tuning of Language Models with Zero-in...20230419-LLaMA-Adapter_ Efficient Fine-tuning of Language Models with Zero-in...
20230419-LLaMA-Adapter_ Efficient Fine-tuning of Language Models with Zero-in...San Kim
 
2023 EMNLP day_san.pptx
2023 EMNLP day_san.pptx2023 EMNLP day_san.pptx
2023 EMNLP day_san.pptxSan Kim
 
LongT5_Efficient Text-toText Transformer for Long Sequences_san.pptx
LongT5_Efficient Text-toText Transformer for Long Sequences_san.pptxLongT5_Efficient Text-toText Transformer for Long Sequences_san.pptx
LongT5_Efficient Text-toText Transformer for Long Sequences_san.pptxSan Kim
 
slide-acl2022-combined_san.pptx
slide-acl2022-combined_san.pptxslide-acl2022-combined_san.pptx
slide-acl2022-combined_san.pptxSan Kim
 
Compeition-Level Code Generation with AlphaCode.pptx
Compeition-Level Code Generation with AlphaCode.pptxCompeition-Level Code Generation with AlphaCode.pptx
Compeition-Level Code Generation with AlphaCode.pptxSan Kim
 
Intrinsic Dimensionality Explains the Effectiveness of Language Model Fine-Tu...
Intrinsic Dimensionality Explains the Effectiveness of Language Model Fine-Tu...Intrinsic Dimensionality Explains the Effectiveness of Language Model Fine-Tu...
Intrinsic Dimensionality Explains the Effectiveness of Language Model Fine-Tu...San Kim
 
AI2 day.pptx
AI2 day.pptxAI2 day.pptx
AI2 day.pptxSan Kim
 
Temporal reasoning task
Temporal reasoning taskTemporal reasoning task
Temporal reasoning taskSan Kim
 
Answering complex open domain questions with multi-hop dense retrieval
Answering complex open domain questions with multi-hop dense retrievalAnswering complex open domain questions with multi-hop dense retrieval
Answering complex open domain questions with multi-hop dense retrievalSan Kim
 
Measuring massive multitask language understanding
Measuring massive multitask language understandingMeasuring massive multitask language understanding
Measuring massive multitask language understandingSan Kim
 
XLnet RoBERTa Reformer
XLnet RoBERTa ReformerXLnet RoBERTa Reformer
XLnet RoBERTa ReformerSan Kim
 
Transformer xl
Transformer xlTransformer xl
Transformer xlSan Kim
 
Face recognition v1
Face recognition v1Face recognition v1
Face recognition v1San Kim
 
Deep learning study 3
Deep learning study 3Deep learning study 3
Deep learning study 3San Kim
 
Deep learning study 1
Deep learning study 1Deep learning study 1
Deep learning study 1San Kim
 

More from San Kim (16)

20230419-LLaMA-Adapter_ Efficient Fine-tuning of Language Models with Zero-in...
20230419-LLaMA-Adapter_ Efficient Fine-tuning of Language Models with Zero-in...20230419-LLaMA-Adapter_ Efficient Fine-tuning of Language Models with Zero-in...
20230419-LLaMA-Adapter_ Efficient Fine-tuning of Language Models with Zero-in...
 
2023 EMNLP day_san.pptx
2023 EMNLP day_san.pptx2023 EMNLP day_san.pptx
2023 EMNLP day_san.pptx
 
LongT5_Efficient Text-toText Transformer for Long Sequences_san.pptx
LongT5_Efficient Text-toText Transformer for Long Sequences_san.pptxLongT5_Efficient Text-toText Transformer for Long Sequences_san.pptx
LongT5_Efficient Text-toText Transformer for Long Sequences_san.pptx
 
slide-acl2022-combined_san.pptx
slide-acl2022-combined_san.pptxslide-acl2022-combined_san.pptx
slide-acl2022-combined_san.pptx
 
Compeition-Level Code Generation with AlphaCode.pptx
Compeition-Level Code Generation with AlphaCode.pptxCompeition-Level Code Generation with AlphaCode.pptx
Compeition-Level Code Generation with AlphaCode.pptx
 
Intrinsic Dimensionality Explains the Effectiveness of Language Model Fine-Tu...
Intrinsic Dimensionality Explains the Effectiveness of Language Model Fine-Tu...Intrinsic Dimensionality Explains the Effectiveness of Language Model Fine-Tu...
Intrinsic Dimensionality Explains the Effectiveness of Language Model Fine-Tu...
 
AI2 day.pptx
AI2 day.pptxAI2 day.pptx
AI2 day.pptx
 
Temporal reasoning task
Temporal reasoning taskTemporal reasoning task
Temporal reasoning task
 
Answering complex open domain questions with multi-hop dense retrieval
Answering complex open domain questions with multi-hop dense retrievalAnswering complex open domain questions with multi-hop dense retrieval
Answering complex open domain questions with multi-hop dense retrieval
 
Measuring massive multitask language understanding
Measuring massive multitask language understandingMeasuring massive multitask language understanding
Measuring massive multitask language understanding
 
Electra
ElectraElectra
Electra
 
XLnet RoBERTa Reformer
XLnet RoBERTa ReformerXLnet RoBERTa Reformer
XLnet RoBERTa Reformer
 
Transformer xl
Transformer xlTransformer xl
Transformer xl
 
Face recognition v1
Face recognition v1Face recognition v1
Face recognition v1
 
Deep learning study 3
Deep learning study 3Deep learning study 3
Deep learning study 3
 
Deep learning study 1
Deep learning study 1Deep learning study 1
Deep learning study 1
 

Recently uploaded

Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |aasikanpl
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...Sérgio Sacani
 
Luciferase in rDNA technology (biotechnology).pptx
Luciferase in rDNA technology (biotechnology).pptxLuciferase in rDNA technology (biotechnology).pptx
Luciferase in rDNA technology (biotechnology).pptxAleenaTreesaSaji
 
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real timeSatoshi NAKAHIRA
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)PraveenaKalaiselvan1
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...jana861314
 
VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PPRINCE C P
 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Patrick Diehl
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxkessiyaTpeter
 
GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxAleenaTreesaSaji
 
G9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptG9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptMAESTRELLAMesa2
 
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.aasikanpl
 
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxPhysiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxAArockiyaNisha
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptxanandsmhk
 
Artificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PArtificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PPRINCE C P
 
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxAnalytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxSwapnil Therkar
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​kaibalyasahoo82800
 
Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhousejana861314
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSarthak Sekhar Mondal
 

Recently uploaded (20)

Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
 
Luciferase in rDNA technology (biotechnology).pptx
Luciferase in rDNA technology (biotechnology).pptxLuciferase in rDNA technology (biotechnology).pptx
Luciferase in rDNA technology (biotechnology).pptx
 
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real time
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
 
VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C P
 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
 
GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptx
 
G9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptG9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.ppt
 
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
 
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxPhysiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
 
Artificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PArtificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C P
 
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxAnalytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​
 
Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhouse
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
 
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
 

Deep learning study 2

  • 1.
  • 2. Confusion Matrix Adopted from kwangil kim’s machine learning class material
  • 3. Confusion Matrix Adopted from kwangil kim’s machine learning class material
  • 4. ROC (Receiver Operating Characteristic) Adopted from kwangil kim’s machine learning class material
  • 5. Parametric vs Non-parametric models Adopted from kwangil kim’s machine learning class material
  • 6. Example of Non-parametric models Adopted from wikipedia
  • 9. What is the Neural Network?
  • 10. How does Neural Network learn? Preparing input and target pairs. inputs targets Lion Cat map 0 1 1 0 0 1 One-hot encoding Dog 2 0 0 0 0 1
  • 11. How does Neural Network learn? The weights of the network are arbitrarily set. 0.6 0.2 0.3 0.9 0.1
  • 12. How does Neural Network learn? Feed Forward
  • 13. How does Neural Network learn? Feed Forward 0.2 0.1 0.6 0.3 0.2 0.7 0.3 0.1 𝑠𝑢𝑚: 0.2 × 0.2 + 0.1 × 0.7 + 0.6 × 0.3 + 0.3 × 0.1 = 0.32 N21 𝑂𝑢𝑡𝑝𝑢𝑡 𝑜𝑓 𝑁21 = 𝑓 0.32 𝑓 𝑖𝑠 𝑎𝑐𝑡𝑖𝑣𝑎𝑡𝑖𝑜𝑛 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛 𝑜𝑓 𝑁21 𝑂𝑢𝑡𝑝𝑢𝑡 𝑜𝑓 𝑁21 = 𝑓 0.32 = 0.1024. 𝑖𝑓 𝑓 𝑥 = 𝑥2
  • 14. How does Neural Network learn? Calculate error Sum of squares loss Softmax loss Cross entropy loss Hinge loss
  • 15. How does Neural Network learn? − Sum of squares loss Softmax loss Cross entropy loss Hinge loss 0.2 0.8 Sum of squares loss = 0.08 0.2 0.8 Output of ANN 0.0 1.0 Target value = 0.04 0.04 ( ) 2
  • 16. How does Neural Network learn? Feedback
  • 17. What we have to decide? Gradient Descent Optimization Algorithms • Batch Gradient Descent • Stochastic Gradient Descent (SGD) • Momentum • Nesterov Accelerated Gradient (NAG) • Adagrad • RMSProp • AdaDelta • Adam
  • 18. What we have to decide? Neural network structure • VGG-19 • GoogLeNet Training techniques • Drop out • sparse Loss function and cost function • Cross entropy • Sum of squeares Optimization algorithm • Adam • SDG
  • 19. Why it’s hard to decide a loss function? In classification. Input NN Output of NN Target Output of NN Calculate NN output Calculate loss loss NN Update weights of NN using loss
  • 20. Why it’s hard to decide a loss function? In classification. Output of NN Target 0.67 0.00 0.02 0.12 0.04 0.00 0.03 0.14 1.0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 Loss Sum of L1 norm Cross entropy 0.68 2.45
  • 21. 1. Forward Propagation 2. Back Propagation for Neural Nets 3. Back Propagation for Pooling 4. Back Propagation for Convolution Back Propagation Ⅰ
  • 22. Problem Problem: XOR Data set: data  target [0, 0]  [1, 0] [0, 1]  [0, 1] [1, 0]  [0, 1] [1, 1]  [1, 0] Layer structure: input layer  hidden layer  output layer • Input layer # of node: 2 • Hidden layer # of node: 3 activation function: logistic sigmoid • Output layer # of node: 2 activation function: logistic sigmoid cost function: sum of square
  • 23. Notation 𝑣1 2 Index of layer Index of neuron in a layer Sum of its inputs Before activation function 𝑦1 2 Index of layer Index of neuron in a layer Output of a Neuron After activation function 𝑤2→1 2 Index of layer from Weight to
  • 24. Forward Propagation 𝑦1 0 𝑦2 0 𝑦0 0 = 1 𝑦0 1 = 1 𝑦1 1 𝑦2 1 𝑦3 1 𝑣1 1 𝑣2 1 𝑣3 1 𝑤1→2 1 𝑤1→3 1 𝑤1→1 1 𝜑(𝑣1 1 ) 𝜑(𝑣2 1 ) 𝜑(𝑣3 1 ) 𝑤2→1 2 𝑤2→2 2 𝑣1 2 𝑣2 2 𝜑(𝑣1 2 ) 𝜑(𝑣2 2 ) 𝑦1 2 𝑦2 2 𝑡2 𝑡1 𝐸 Bias input Legend Input node Output node Hidden node Cost function Layer 0: Input layer Layer 1: hidden layer Layer 2: output layer Cost part
  • 25. Forward Propagation 𝑦1 0 𝑦2 0 𝑦0 0 = 1 𝑦0 1 = 1 𝑦1 1 𝑦2 1 𝑦3 1 𝑣1 1 𝑣2 1 𝑣3 1 𝑤1→2 1 𝑤1→3 1 𝑤1→1 1 𝜑(𝑣1 1 ) 𝜑(𝑣2 1 ) 𝜑(𝑣3 1 ) 𝑤2→1 2 𝑤2→2 2 𝑣1 2 𝑣2 2 𝜑(𝑣1 2 ) 𝜑(𝑣2 2 ) 𝑦1 2 𝑦2 2 𝑡2 𝑡1 𝐸 𝑣𝑗 𝑖 = 𝑘=0 𝑀 𝑖−1 𝑤 𝑘→𝑗 𝑖 × 𝑦 𝑘 𝑖−1 if 𝑀 𝑛 is # of nodes of n layer 𝑦𝑗 𝑖 = 𝜑(𝑣𝑗 𝑖 ) 𝜑() is activation function 𝐸 = 𝑘=1 𝑀 𝐿 (𝑦 𝑘 𝐿 − 𝑡 𝑘)2 if L is index of output layer
  • 26. Back Propagation for Neural Nets 𝑦0 1 = 1 𝑦1 1 𝑦2 1 𝑦3 1 𝑣1 1 𝑣2 1 𝑣3 1 𝜑(𝑣1 1 ) 𝜑(𝑣2 1 ) 𝜑(𝑣3 1 ) 𝑤2→1 2 𝑤2→2 2 𝑣1 2 𝑣2 2 𝜑(𝑣1 2 ) 𝜑(𝑣2 2 ) 𝑦1 2 𝑦2 2 𝑒1 2 𝑒2 2 𝛿1 2 𝛿1 2 𝐸 𝛿1 1 𝛿2 1 𝛿3 1 𝑡1 𝑡2 𝑒 𝑘 𝐿 = 𝑦𝑖 𝐿 - 𝑡𝑖 𝐸 = 𝑘=1 𝑀 𝐿 (𝑒 𝑘 𝐿 )2 𝐸 = (𝑒1 2 )2 + (𝑒2 2 )2 Total error in this example
  • 27. Back Propagation for Neural Nets Minimize 𝐸 = (𝑒1 2 )2 + (𝑒2 2 )2 Gradient -Gradient -Gradient − 𝜕𝐸 𝜕𝑒2 − 𝜕𝐸 𝜕𝑒1
  • 28. Back Propagation for Neural Nets
  • 29. Back Propagation for Neural Nets 𝑦0 1 = 1 𝑦1 1 𝑦2 1 𝑦3 1 𝑣1 1 𝑣2 1 𝑣3 1 𝜑(𝑣1 1 ) 𝜑(𝑣2 1 ) 𝜑(𝑣3 1 ) 𝑤2→1 2 𝑤2→2 2 𝑣1 2 𝑣2 2 𝜑(𝑣1 2 ) 𝜑(𝑣2 2 ) 𝑦1 2 𝑦2 2 𝑒1 2 𝑒2 2 𝛿1 2 𝛿1 2 𝐸 𝛿1 1 𝛿2 1 𝛿3 1 𝑡1 𝑡2 ∆𝑤2→1 2 = − 𝜕𝐸 𝜕𝑒1 2 𝜕𝑒1 2 𝜕𝑦1 2 𝜕𝑦1 2 𝜕𝑣1 2 𝜕𝑣1 2 𝜕𝑤2→1 2 𝜂 ∆𝑤 𝑘→𝑗 𝐿 = − 𝜕𝐸 𝜕𝑒 𝑗 𝐿 𝜕𝑒𝑗 𝐿 𝜕𝑦𝑗 𝐿 𝜕𝑦𝑗 𝐿 𝜕𝑣 𝑗 𝐿 𝜕𝑣 𝑗 𝐿 𝜕𝑤 𝑘→𝑗 𝐿 𝜂 𝛿1 2 𝛿𝑗 𝐿 𝑙𝑜𝑐𝑎𝑙 𝑔𝑟𝑎𝑑𝑖𝑒𝑛𝑡 𝜂 ∶ 𝑙𝑒𝑎𝑟𝑛𝑖𝑛𝑔 𝑟𝑎𝑡𝑒
  • 30. Back Propagation for Neural Nets 𝜕𝐸 𝜕𝑒 𝑗 𝐿 = 2𝑒𝑗 𝐿 ∵ 𝐸 = 𝑘=1 𝑀 𝐿 (𝑒 𝑘 𝐿 )2 𝜕𝑒 𝑗 𝐿 𝜕𝑦𝑗 𝐿 = 1 ∵ 𝑦𝑗 𝐿 = 𝑦𝑗 𝐿 − 𝑡 𝑗 ∆𝑤 𝑘→𝑗 𝐿 = − 𝜕𝐸 𝜕𝑒 𝑗 𝐿 𝜕𝑒 𝑗 𝐿 𝜕𝑦𝑗 𝐿 𝜕𝑦𝑗 𝐿 𝜕𝑣 𝑗 𝐿 𝜕𝑣 𝑗 𝐿 𝜕𝑤 𝑘→𝑗 𝐿 𝜂 𝛿𝑗 𝐿 𝑙𝑜𝑐𝑎𝑙 𝑔𝑟𝑎𝑑𝑖𝑒𝑛𝑡 𝜕𝑦𝑗 𝐿 𝜕𝑣 𝑗 𝐿 = 𝜑′ 𝑣𝑗 𝐿 ∵ 𝑦𝑗 𝐿 = 𝜑(𝑣𝑗 𝐿 ) 𝜕𝑣 𝑗 𝐿 𝜕𝑤 𝑘→𝑗 𝐿 = 𝑦 𝑘 𝐿−1 ∵ 𝑣𝑗 𝐿 = 𝑙=0 𝑀 𝐿−1 𝑤𝑙→𝑗 𝐿 × 𝑦𝑙 𝐿−1 ∆𝑤 𝑘→𝑗 𝐿 = 𝛿𝑗 𝐿 𝑦 𝑘 𝐿−1 𝜂 = −2𝑒𝑗 𝐿 𝜑′ 𝑣𝑗 𝐿 𝑦 𝑘 𝐿−1 𝜂 𝛿𝑗 𝐿 = −2𝑒𝑗 𝐿 𝜑′ 𝑣𝑗 𝐿 𝜑′ 𝑣𝑗 𝐿 = 𝜑 𝑣𝑗 𝐿 (1 − 𝜑 𝑣𝑗 𝐿 ) ∵ 𝜑 𝑣𝑗 𝐿 = 1 1+𝑒 −𝑣 𝑗 𝐿
  • 31. Back Propagation for Neural Nets 𝑦0 1 = 1 𝑦1 1 𝑦2 1 𝑦3 1 𝑣1 1 𝑣2 1 𝑣3 1 𝜑(𝑣1 1 ) 𝜑(𝑣2 1 ) 𝜑(𝑣3 1 ) 𝑤2→1 2 𝑤2→2 2 𝑣1 2 𝑣2 2 𝜑(𝑣1 2 ) 𝜑(𝑣2 2 ) 𝑦1 2 𝑦2 2 𝑒1 2 𝑒2 2 𝛿1 2 𝛿1 2 𝐸 𝛿1 1 𝛿2 1 𝛿3 1 𝑡1 𝑡2 ∆𝑤2→1 2 = 𝛿1 2 𝑦2 1 𝜂 = −2𝑒1 2 𝜑′ 𝑣1 2 𝑦2 1 𝜂 𝛿1 2 = −2𝑒1 2 𝜑′ 𝑣1 2
  • 32. Back Propagation for Neural Nets 𝑦0 1 = 1 𝑦1 1 𝑦2 1 𝑦3 1 𝑣1 1 𝑣2 1 𝑣3 1 𝜑(𝑣1 1 ) 𝜑(𝑣2 1 ) 𝜑(𝑣3 1 ) 𝑤2→1 2 𝑤2→2 2 𝑣1 2 𝑣2 2 𝜑(𝑣1 2 ) 𝜑(𝑣2 2 ) 𝑦1 2 𝑦2 2 𝑒1 2 𝑒2 2 𝛿1 2 𝛿1 2 𝐸 𝛿1 1 𝛿2 1 𝛿3 1 𝑡1 𝑡2 𝛿2 1 = − 𝜕𝐸 𝜕𝑣2 1 = − 𝜕 𝑘=1 2 (𝑒 𝑘 2 )2 𝜕𝑣2 1 = − 𝑘=1 2 𝜕(𝑒 𝑘 2 )2 𝜕𝑣2 1 = 𝑘=1 2 − 𝜕(𝑒 𝑘 2 )2 𝜕𝑒 𝑘 2 𝜕𝑒 𝑘 2 𝜕𝑣2 1 = 𝑘=1 2 − 𝜕(𝑒 𝑘 2 )2 𝜕𝑒 𝑘 2 𝜕𝑒 𝑘 2 𝜕𝑦 𝑘 2 𝜕𝑦 𝑘 2 𝜕𝑣 𝑘 2 𝜕𝑣 𝑘 2 𝜕𝑦2 1 𝜕𝑦2 1 𝜕𝑣2 1 = 𝑘=1 2 𝛿 𝑘 2 𝜕𝑣 𝑘 2 𝜕𝑦2 1 𝜕𝑦2 1 𝜕𝑣2 1 = 𝑘=1 2 𝛿 𝑘 2 𝜕𝑣 𝑘 2 𝜕𝑦2 1 𝜑′(𝑣2 1 ) = 𝜑′ 𝑣2 1 𝑘=1 2 𝛿 𝑘 2 𝑤2→𝑘 2 ∵ 𝑣 𝑘 2 = 𝑚=0 3 𝑦 𝑚 1 𝑤 𝑚→𝑘 2
  • 33. Back Propagation for Neural Nets 𝑦0 1 = 1 𝑦1 1 𝑦2 1 𝑦3 1 𝑣1 1 𝑣2 1 𝑣3 1 𝜑(𝑣1 1 ) 𝜑(𝑣2 1 ) 𝜑(𝑣3 1 ) 𝑤2→1 2 𝑤2→2 2 𝑣1 2 𝑣2 2 𝜑(𝑣1 2 ) 𝜑(𝑣2 2 ) 𝑦1 2 𝑦2 2 𝑒1 2 𝑒2 2 𝛿1 2 𝛿1 2 𝐸 𝑡1 𝑡2 𝛿 𝑚 𝑖 = − 𝜑′ 𝑣 𝑚 𝑖 𝑘=1 𝑀 𝑛 𝛿 𝑘 𝑖+1 𝑤 𝑚→𝑘 𝑖+1 𝑦1 0 𝑦2 0 𝑦0 0 = 1 𝑤1→2 1 𝑤1→3 1 𝑤1→1 1 𝛿1 1 𝛿2 1 𝛿3 1 ∆𝑤𝑗→𝑚 𝑖 = 𝛿 𝑚 𝑖 𝑦𝑗 𝑖−1 𝜂
  • 34. Back Propagation for Neural Nets 𝛿 𝛿11 𝛿12 𝑣 𝑦11 𝑦12 𝛿 𝛿 𝑣 = 𝑦11 + 𝑦12 𝑑𝑣 𝑑𝑦11 = 𝑑𝑣 𝑑𝑦12 = 1 𝛿11 = 𝛿12 = 𝛿 𝑣𝑦11 𝑤11 𝛿𝛿 𝑦 𝛿 𝑤 𝑣 = 𝑦11 𝑤11 𝑑𝑣 𝑑𝑦11 = 𝑤11 𝑑𝑣 𝑑𝑤11 = 𝑦11 𝛿 𝑦 = 𝑤11 𝛿 𝛿 𝑤 = 𝑦11 𝛿
  • 35. Back Propagation for Pooling 88 92 81 96 96 88 92 84 96 90 Max Pooling Average Pooling 𝛿 𝛿11 = 0 𝛿12 = 0 𝛿21 = 0 𝛿22 = 𝛿 𝛿 𝛿11 = 1 4 𝛿 𝛿12 = 1 4 𝛿 𝛿21 = 1 4 𝛿 𝛿22 = 1 4 𝛿
  • 36. Back Propagation for Convolution 𝑦11 𝑦12 𝑦21 𝑦22 𝑤11 𝑤12 𝑤21 𝑤22 𝑥11 𝑥12 𝑥21 𝑥22 𝑥13 𝑥23 𝑥31 𝑥32 𝑥33 ∗ = 𝑥11 𝑥12 𝑥13 𝑥21 𝑥22 𝑥23 𝑥31 𝑥32 𝑥33 𝑦11 𝑦12 𝑦21 𝑦22 𝛿1 𝛿2 𝛿3 𝛿4 𝛿11 𝛿12 𝛿13 𝛿21 𝛿22 𝛿23 𝛿31 𝛿32 𝛿33 𝛿11= 𝑤11 𝛿1 𝛿22= 𝑤21 𝛿1+ 𝑤21 𝛿2+ 𝑤21 𝛿3 + 𝑤21 𝛿4 ∆𝑤11= η(𝑥11 𝛿1+ 𝑥12 𝛿2+ 𝑥21 𝛿3 + 𝑥22 𝛿4) ∆𝑤21= η(𝑥21 𝛿1+ 𝑥22 𝛿2+ 𝑥31 𝛿3 + 𝑥32 𝛿4)
  • 37. Linear activation function Input layer Hidden layer 1 Hidden layer 2 Hidden layer 3 Output layer 𝑶𝒖𝒕𝒑𝒖𝒕 = 𝑰𝒏𝒑𝒖𝒕 × 𝑾𝟏 × 𝑾𝟐 × 𝑾𝟑 = 𝑰𝒏𝒑𝒖𝒕 × 𝑾 𝑾𝟏 𝑾𝟐 𝑾𝟑𝑰𝒏𝒑𝒖𝒕 𝑶𝒖𝒕𝒑𝒖𝒕 Input layer Hidden layer Output layer 𝑾𝑰𝒏𝒑𝒖𝒕 𝑶𝒖𝒕𝒑𝒖𝒕
  • 40. Convolution Adopted from apple developer documentation archive
  • 41. Convolution • Filter size • Stride • Padding • Dilation • Transposed convolution (Deconv.) No padding, no stride No padding, stride same padding, no stride padding, stride arbitrary padding, no stride
  • 42. Convolution Padding, strides, transposed No padding, strides, transposed full padding, no strides, transposed no padding, no strides, transposed
  • 43. Convolution no padding, no strides, dilation All convolution animations are adopted from https://github.com/vdumoulin/conv_arithmetic
  • 45. practice Data set: cifar 10 문제: 10개 class의 classification 문제 Input domain: 32X32 3channel image 접근방법: parametric model Estimation 방법: MLE (frequentist’s way) Adopted from torch homepage Adopted from Wikipedia (Maximum entropy probability distribution)
  • 46. practice Maximum entropy function: Multinomial distribution. N = 1 for 1 sample, k = number of classes.
  • 47. practice PMF: 𝑛! 𝑥1!⋯𝑥 𝑘! 𝑝1 𝑥1 ⋯ 𝑝 𝑘 𝑥 𝑘 𝑝1 𝑥1 ⋯ 𝑝 𝑘 𝑥 𝑘 𝑖𝑓 𝑛 𝑖𝑠 1 𝑥1, ⋯ , 𝑥 𝑘 = 0, 1, ⋯ , 0 𝑝1, ⋯ , 𝑝 𝑘 = 0.1, 0.3, ⋯ , 0.02
  • 48. practice 𝑛! 𝑗=1 𝑘 𝑥𝑗! 𝑗=1 𝑘 𝑝𝑗 𝑥 𝑗 = 𝑛! 𝑗=1 𝑘 𝑥𝑗! exp 𝑗=1 𝑘 𝑥𝑗 l𝑛 𝑝𝑗 = 𝑛! 𝑗=1 𝑘 𝑥𝑗! exp 𝑗=1 𝑘−1 𝑥𝑗 l𝑛 𝑝𝑗 + (𝑛 − 𝑗=1 𝑘−1 𝑥𝑗) l𝑛(1 − 𝑗=1 𝑘−1 𝑝𝑗) = 𝑛! 𝑗=1 𝑘 𝑥𝑗! exp 𝑗=1 𝑘−1 𝑥𝑗 l𝑛 𝑝𝑗 − 𝑗=1 𝑘−1 𝑥𝑗 l𝑛 1 − 𝑗=1 𝑘−1 𝑝𝑗 + 𝑛 l𝑛(1 − 𝑗=1 𝑘−1 𝑝𝑗)) = 𝑛! 𝑗=1 𝑘 𝑥𝑗! exp 𝑗=1 𝑘−1 𝑥𝑗 l𝑛 𝑝𝑗 1 − 𝑗=1 𝑘−1 𝑝𝑗 + 𝑛 l𝑛(1 − 𝑗=1 𝑘−1 𝑝𝑗))
  • 49. practice 𝑛! 𝑗=1 𝑘 𝑥𝑗! exp 𝑗=1 𝑘−1 𝑥𝑗 l𝑛 𝑝𝑗 1 − 𝑗=1 𝑘−1 𝑝𝑗 + 𝑛 l𝑛(1 − 𝑗=1 𝑘−1 𝑝𝑗)) 𝑓𝑥 𝑥 𝜃 = ℎ 𝑥 exp(𝜂 𝜃 ∙ 𝑇 𝑥 − 𝐴 𝜃 ) • ℎ 𝒙 = 𝑛! 𝑗=1 𝑘 𝑥 𝑗! • 𝜂 𝜃 = [l𝑛 𝑝1 1− 𝑗=1 𝑘−1 𝑝 𝑗 , ⋯ , l𝑛 𝑝 𝑘−1 1− 𝑗=1 𝑘−1 𝑝 𝑗 , 0] • 𝑇 𝒙 = [𝑥1, ⋯ , 𝑥 𝑘] • 𝐴 𝜃 = −𝑛 l𝑛(1 − 𝑗=1 𝑘−1 𝑝𝑗))
  • 50. practice Adopted from Wikipedia (exponential family) • 𝜂 𝜃 = [l𝑛 𝑝1 1− 𝑗=1 𝑘−1 𝑝 𝑗 , ⋯ , l𝑛 𝑝 𝑘−1 1− 𝑗=1 𝑘−1 𝑝 𝑗 , 0] • 𝜂𝑖 = l𝑛 𝑝 𝑖 1− 𝑗=1 𝑘−1 𝑝 𝑗 , 𝑖𝑓 𝑖 < 𝑘 0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
  • 51. practice 𝑒 𝜂 𝑖 = 𝑝𝑖 1 − 𝑗=1 𝑘−1 𝑝𝑗 ⟹ 𝑖=1 𝑘 𝑒 𝜂 𝑖 = 𝑖=1 𝑘 𝑝𝑖 1 − 𝑗=1 𝑘−1 𝑝𝑗 = 1 1 − 𝑗=1 𝑘−1 𝑝𝑗 𝑖=1 𝑘 𝑝𝑖 = 1 1 − 𝑗=1 𝑘−1 𝑝𝑗 ⟹ 1 − 𝑗=1 𝑘−1 𝑝𝑗 = 1 𝑖=1 𝑘 𝑒 𝜂 𝑖 ⟹ 𝑒 𝜂 𝑖 = 𝑝𝑖 1 𝑖=1 𝑘 𝑒 𝜂 𝑖 ⟹ 𝑝𝑖 = 𝑒 𝜂 𝑖 𝑖=1 𝑘 𝑒 𝜂 𝑖
  • 52. practice • 𝑝𝑖 = 𝑒 𝜂 𝑖 𝑖=1 𝑘 𝑒 𝜂 𝑖 • 𝜂𝑖 = l𝑛 𝑝 𝑖 1− 𝑗=1 𝑘−1 𝑝 𝑗 Natural parameter Inverse parameter mapping • 𝑝𝑖 Parameter Adopted from Wikipedia (exponential family) Categorical distribution (if n=1 in multinomial distribution)
  • 53. practice Objective function: Maximize likelihood  maximize log-likelihood  minimize negative log-likelihood min 𝜃 − log 𝑃 𝑦 𝑥; 𝜂 − log 𝑃 𝑦 𝑥; 𝜂 = − log 𝑖=1 𝑚 𝑗=1 𝑘 𝑝𝑗 𝑦 𝑖,𝑗 , 𝑖𝑓 𝑤𝑒 ℎ𝑎𝑣𝑒 𝑚 𝑠𝑎𝑚𝑝𝑙𝑒𝑠 𝑎𝑛𝑑 𝑒𝑎𝑐ℎ 𝑠𝑎𝑚𝑝𝑙𝑒𝑠 𝑎𝑟𝑒 𝑖. 𝑖. 𝑑. X: input data (feature of input data)-image Y: label 𝜂: natural parameter min 𝜃 𝑖=1 𝑚 𝑗=1 𝑘 −𝑦𝑖,𝑗 log 𝑝𝑗  cross entropy Why we use natural parameter? • 𝜂 is called the natural parameter. The set of values of 𝜂 for which the function 𝑓𝑋 𝑥; 𝜃 is finite is called the natural parameter space. It can be shown that the natural parameter space is always convex. • And it is canonical parameter.
  • 54. practice Overall model Input image Extracting feature from input domain (Conv) Distribution estimation (Full connected) (weights: 𝜼) Loss function (Cross entropy) Output layer (softmax) (𝜼->p) Forward propagation step: we set the weight of fully connected network as natural parameter of categorical distribution. So, activation function of output layer could be a softmax function. Because the loss function is cross entropy of P vector and label vector and we assume that the weights of full connected layer is natural parameter of multinomial distribution. Back propagation step: updated by optimizer (SGD or Adam).
  • 55. Schedule 다음 시간: 실제로 Cifar10 dataset으로 classifier 제작하기. Prerequisite: pytorch 개발 환경. (anaconda 환경으로 설치 추천)