1. National Agriculture and Food Research Organization
2020年10月22日~ 2021年3月25日
PyTorchで始める深層学習を用いた物体認識
- シーズン1:ML基本からResNetまで-
趙 元在(チョ ウォンジェ)
農業・食品産業技術総合研究機構
農業技術革新工学研究センター
高度作業支援システム研究領域
高度土地利用型作業ユニット
chou555@affrc.go.jp
2. National Agriculture and Food Research Organization
Goals
• Basic understanding of machine learning algorithms
➢ Linear regression, Logistic regression (classification)
➢ Neural networks, Convolutional Neural Network (CNN)
• Basic understanding of CNN architecture for Object Detection
➢ AlexNet, VGG, ResNet, GoogLeNet / InceptionNet, R-CNN, YOLO, SSD …
• Solve your problems using machine learning tools
➢ PyTorch and Python (ver. 3.x)
chou555@affrc.go.jp
3. National Agriculture and Food Research Organization
Course structure
• About 30 min Lecture (Theory: 15min, Practice: 15min)
• Programming tutorial using PyTorch
chou555@affrc.go.jp
4. National Agriculture and Food Research Organization
Schedule
• Machine learning basic concepts and practice Python (ver3.x)
• Linear regression
• Logistic regression (classification)
• Multivariable (Vector) linear / logistic regression
• Neural networks and CNN
• Deep learning for object detection
➢ AlexNet
➢ VGG
➢ ResNet
➢ GoogLeNet / InceptionNet
➢ R-CNN
➢ YOLO
➢ SSD
➢ SOTA(State of the art) Review: EfficientDet, YOLO-v4, etc.
Paper list from 2014 to 2019
https://github.com/hoya012/deep_learning_object_detection
chou555@affrc.go.jp
5. National Agriculture and Food Research Organization
Acknowledgement
• Deep learning zero to all (Korean / English)
➢ https://hunkim.github.io/ml/
• Dive into Deep Learning (English)
➢ https://d2l.ai/index.html
• CS231n: Convolutional Neural Networks for Visual Recognition (English)
➢ http://cs231n.stanford.edu/
➢ https://cs231n.github.io/
➢ http://cs231n.stanford.edu/slides/
• Andrew Ng’s Machine Learning Class (English)
➢ https://www.coursera.org/learn/machine-learning
➢ http://holehouse.org/mlclass/
• PyTorch Tutorials
➢ https://pytorch.org/tutorials/ (English)
➢ https://github.com/omerbsezer/Fast-Pytorch (English)
➢ https://github.com/GunhoChoi/PyTorch-FastCampus (Korean / English)
➢ https://wikidocs.net/book/2788 (Korean)
chou555@affrc.go.jp
6. National Agriculture and Food Research Organization
Machine Learning Basics: Concepts
• What is ML?
• What is learning?
➢ supervised
➢ unsupervised
➢ reinforcement learning
• What is regression?
• What is classification?
chou555@affrc.go.jp
7. National Agriculture and Food Research Organization
Machine Learning Basics: What is ML?
• Limitations of explicit programming
➢Spam filter: many rules
➢Automatic driving: too many rules
• Machine Learning: “Field of study that gives
computers the ability to learn without being
explicitly programmed” Arthur Samuel (1959)
https://en.wikipedia.org/wiki/Arthur_Samuel
https://www.logianalytics.com/predictive-analytics/
machine-learning-vs-traditional-programming/
chou555@affrc.go.jp
8. National Agriculture and Food Research Organization
Machine Learning Basics: What is learning?
https://wendys.tistory.com/169
chou555@affrc.go.jp
9. National Agriculture and Food Research Organization
Machine Learning Basics: What is learning?
https://bigdata-madesimple.com/machine-learning-explained-understanding-
supervised-unsupervised-and-reinforcement-learning/
https://blogs.nvidia.com/blog/2018/08/02/supervised-unsupervised-learning/
Supervised learning
Most common problem type in ML.
ex) disease and insect pest detection, crop detection..
chou555@affrc.go.jp
10. National Agriculture and Food Research Organization
Machine Learning Basics: Types of supervised learning
http://sqlmvp.kr/221844842758
• Regression
➢ Predicting final exam score(0~100) based on time spent
• Binary classification
➢ Pass/non-pass based on time spent
• Multi-label classification
➢ Letter grade (A, B, C, D and F) based on time spent
https://youtu.be/qPMeuL2LIqY
chou555@affrc.go.jp
11. National Agriculture and Food Research Organization
Python Tutorial with Google Colab (1/3)
https://colab.research.google.com/github/cs231n/cs231n.github.io/blob/master/python-colab.ipynb
1. Go to drive.google.com
2. Log into your Google account with username and password
3. アドオン取得のアイコンをクリック
12. National Agriculture and Food Research Organization
Python Tutorial with Google Colab (2/3)
4. “Colaboratory” 検索
5. アイコンクリック
6. インストールクリック
7. “続行”クリック
8. IDクリック
https://colab.research.google.com/github/cs231n/cs231n.github.io/blob/master/python-colab.ipynb
13. National Agriculture and Food Research Organization
Python Tutorial with Google Colab (3/3)
9. 完了 10. その他→Google Colaboratory クリック
11. Hello World 出力テスト
https://colab.research.google.com/github/cs231n/cs231n.github.io/blob/master/python-colab.ipynb
14. National Agriculture and Food Research Organization
Linear Regression: Basic concepts
Predicting exam score using regression
Hours (x) Points (y)
1 2
2 4
3 6
4 ?
Training dataset
Test dataset
https://wikidocs.net/53560
Our goal(Prediction of Points using regression)
Dataset
𝑥𝑡𝑟𝑎𝑖𝑛 =
1
2
3
𝑦𝑡𝑟𝑎𝑖𝑛 =
2
4
6
x_train = torch.FloatTensor([[1], [2], [3]])
y_train = torch.FloatTensor([[2], [4], [6]])
15. National Agriculture and Food Research Organization
Linear Regression: Hypothesis
Hours (x) Points (y)
1 2
2 4
3 6
4 ?
Dataset
0
1
2
3
4
5
6
7
0 1 2 3 4
𝑥𝑡𝑟𝑎𝑖𝑛 =
1
2
3
𝑦𝑡𝑟𝑎𝑖𝑛 =
2
4
6
𝑯 𝒙 = 𝑾𝒙 + 𝒃
Which hypothesis is better?
16. National Agriculture and Food Research Organization
Linear Regression: Hypothesis
Which hypothesis is better?
𝑯 𝒙 = 𝑾𝒙 + 𝒃
0
1
2
3
4
5
6
7
0 0.5 1 1.5 2 2.5 3 3.5
Hours (x) Points (y)
1 2
2 4
3 6
4 ?
Dataset
𝑥𝑡𝑟𝑎𝑖𝑛 =
1
2
3
𝑦𝑡𝑟𝑎𝑖𝑛 =
2
4
6
17. National Agriculture and Food Research Organization
Linear Regression: Cost Function
How fit the line to our training data?
Hours (x) Points (y)
1 2
2 4
3 6
4 ?
Dataset
𝑥𝑡𝑟𝑎𝑖𝑛 =
1
2
3
𝑦𝑡𝑟𝑎𝑖𝑛 =
2
4
6
𝑯 𝒙 = 𝑾𝒙 + 𝒃
(𝐻(𝑥 1
) − 𝑦(1)
)2
+(𝐻(𝑥 2
) − 𝑦(2)
)2
+(𝐻(𝑥 3
) − 𝑦(3)
)2
3
𝑐𝑜𝑠𝑡 𝑊, 𝑏 =
1
𝑚
𝑖=1
𝑚
(𝐻(𝑥 𝑖 ) − 𝑦(𝑖))2
Goal: Minimize Cost
Minimize Cost(W, b)
W, b
cost function = loss function = error function = objective function
18. National Agriculture and Food Research Organization
Linear Regression: Optimizer – Gradient Descent
接線の傾き=0
𝑯 𝒙 = 𝑾𝒙 + 𝒃, 𝒘𝒉𝒆𝒓𝒆 𝒃 = 𝟎
𝑊 ≔ 𝑊 − 𝛼
𝜕
𝜕𝑊
𝑐𝑜𝑠𝑡(𝑊)
Learning rate
What is learning rate?
https://en.wikipedia.org/wiki/Learning_rate
https://www.jeremyjordan.me/nn-learning-rate/
21. National Agriculture and Food Research Organization
Logistic Regression: Basic concepts
Binary Classification & Encoding
• Plant disease detection: disease(1) or non-disease(0)
• Target crop classification: target crop(1) or others(0)
• Facebook feed: show(1) or hide(0)
• Credit card fraudulent transaction detection: fraud(1) or legitimate(0)
22. National Agriculture and Food Research Organization
Logistic Regression: Basic concepts
http://hleecaster.com/ml-logistic-regression-concept/
Linear Regression Logistic Regression
Pass(1) or Fail(0) based on study hours
Sigmoid function = Logistic function
23. National Agriculture and Food Research Organization
Logistic Regression: Hypothesis using sigmoid function
𝐻 𝑥 = 𝑠𝑖𝑔𝑚𝑜𝑖𝑑 𝑊𝑥 + 𝑏 =
1
1 + 𝑒−(𝑊𝑥+𝑏)
= 𝜎(𝑊𝑥 + 𝑏)
Hypothesis Hypothesis of Linear Regression
24. National Agriculture and Food Research Organization
Logistic Regression: Cost function
𝑐𝑜𝑠𝑡 𝑊, 𝑏 =
1
𝑚
𝑖=1
𝑚
(𝐻(𝑥 𝑖 ) − 𝑦(𝑖))2
Cost function of Linear Regression :
𝐻 𝑥 = 𝑊𝑥 + 𝑏
Linear Regression
𝐻 𝑥 =
1
1 + 𝑒−(𝑊𝑥+𝑏)
Logistic Regression
https://laptrinhx.com/working-of-linear-regression-2055506697/
Convex
http://gemba-iot.com/?p=540
Non-Convex
34. National Agriculture and Food Research Organization
Multinomial Classification: Basic concepts
Linear Model
𝑊𝑥 + 𝑏
2.0
1.0
0.1
0.7
0.2
0.1
Logit Softmax
𝑆(𝑦)
𝑆(𝑦)
𝑦
𝑥
Input Data
1.0
0.0
0.0
1-Hot Encoding
𝐷(𝑆, 𝐿)
𝐿
Cross Entropy
Topic keywords: softmax, cross entropy, 1-hot encoding
35. National Agriculture and Food Research Organization
Multinomial Classification: Softmax function
𝑆(𝑦𝑖) =
𝑒𝑦𝑖
σ𝑗=1
𝑘
𝑒𝑦𝑗
Linear Model
𝑊𝑥 + 𝑏
2.0
1.0
0.1
0.7
0.2
0.1
Logit Softmax
𝑆(𝑦)
𝑆(𝑦)
𝑦
𝑥
Input Data
1.0
0.0
0.0
1-Hot Encoding
𝐷(𝑆, 𝐿)
𝐿
Cross Entropy
2.0
1.0
0.1
𝑦
LOGIT SCORES SOFTMAX
𝑝 = 0.7
𝑝 = 0.2
𝑝 = 0.1
PROBABILITIES
36. National Agriculture and Food Research Organization
Multinomial Classification: Cost function
Linear Model
𝑊𝑥 + 𝑏
2.0
1.0
0.1
0.7
0.2
0.1
Logit Softmax
𝑆(𝑦)
𝑆(𝑦)
𝑦
𝑥
Input Data
1.0
0.0
0.0
1-Hot Encoding
𝐷(𝑆, 𝐿)
𝐿
Cross Entropy
𝐷 𝑆, 𝐿 = −
𝑖
𝐿𝑖 log(𝑆𝑖)
Cross Entropy
c𝑜𝑠𝑡 𝑊 = −
1
𝑛
σ𝑖=1
𝑛 σ𝑗=1
𝑘
𝐿𝑗
(𝑖)
log(𝑝𝑗
(𝑖)
)
37. National Agriculture and Food Research Organization
Multinomial Classification: Practice using Colab
目標:アイリスの種類を分類する!
setosa versicolor virginica
https://archive.ics.uci.edu/ml/datasets/iris
https://gist.github.com/curran/a08a1080b88344b0c8a7#file-iris-csv
38. National Agriculture and Food Research Organization
Multinomial Classification: Practice using Colab
Exploratory data analysis (EDA)
39. National Agriculture and Food Research Organization
Multinomial Classification: Practice using Colab
Exploratory data analysis (EDA)
40. National Agriculture and Food Research Organization
Multinomial Classification: Practice using Colab
41. National Agriculture and Food Research Organization
Multinomial Classification: Practice using Colab
43. National Agriculture and Food Research Organization
Basics of Deep Neural Network: Perceptron
http://cs231n.stanford.edu/slides/2016/winter1516_lecture5.pdf
44. National Agriculture and Food Research Organization
Basics of Deep Neural Network: Single-Layer Perceptron
Single-Layer Perceptron
Input Layer Output Layer
45. National Agriculture and Food Research Organization
Basics of Deep Neural Network: Single-Layer Perceptron
AND gate
𝒙𝟏 𝒙𝟐 𝒚
0 0 0
0 1 0
1 0 0
1 1 1
NAND gate
𝒙𝟏 𝒙𝟐 𝒚
0 0 1
0 1 1
1 0 1
1 1 0
46. National Agriculture and Food Research Organization
Basics of Deep Neural Network: Single-Layer Perceptron
OR gate
𝒙𝟏 𝒙𝟐 𝒚
0 0 0
0 1 1
1 0 1
1 1 1
XOR gate
𝒙𝟏 𝒙𝟐 𝒚
0 0 0
0 1 1
1 0 1
1 1 0
Single-Layer Perceptronでは
XOR gateを作成不可能!!
Multi-Layer Perceptron
47. National Agriculture and Food Research Organization
Basics of Deep Neural Network: Single-Layer Perceptron
Implementation of XOR gate using PyTorch
48. National Agriculture and Food Research Organization
Basics of Deep Neural Network: Multi-Layer Perceptron(MLP)
XOR gate
𝒙𝟏 𝒙𝟐 𝒚
0 0 0
0 1 1
1 0 1
1 1 0
Input Layer Hidden Layer Output Layer
What is the difference between a neural network and a deep neural network, and why do the deep ones work better? - Cross Validated (stackexchange.com)
49. National Agriculture and Food Research Organization
Basic of Python and PyTorch: Practice using Colab
Vector, Matrix and Tensor
10 most common Maths Operation with Pytorch Tensor | by Nooras Fatima Ansari | Medium
2D Tensor t = (batch size, dim)
dim
batch_size
3D Tensor t = (batch size, width, height)
width
height
batch_size
50. National Agriculture and Food Research Organization
Basic of Python and PyTorch: Practice using Colab
Vector(1D Tensor)
51. National Agriculture and Food Research Organization
Basic of Python and PyTorch: Practice using Colab
Matrix(2D Tensor) t = (batch size, dim)
dim
batch_size
52. National Agriculture and Food Research Organization
Basic of Python and PyTorch: Practice using Colab
torch.matmul — PyTorch 1.7.0 documentation
53. National Agriculture and Food Research Organization
Basic of Python and PyTorch: Practice using Colab
torch.mean — PyTorch 1.7.0 documentation
54. National Agriculture and Food Research Organization
Basic of Python and PyTorch: Practice using Colab
torch.max — PyTorch 1.7.0 documentation
55. National Agriculture and Food Research Organization
Recap: Single-Layer Perceptron
Single-Layer Perceptron
Input Layer Output Layer
56. National Agriculture and Food Research Organization
Recap: Multi-Layer Perceptron(MLP)
XOR gate
𝒙𝟏 𝒙𝟐 𝒚
0 0 0
0 1 1
1 0 1
1 1 0
Input Layer Hidden Layer Output Layer
What is the difference between a neural network and a deep neural network, and why do the deep ones work better? - Cross Validated (stackexchange.com)
71. National Agriculture and Food Research Organization
Implement XOR using PyTorch: Practice using Colab
Input Layer
Hidden Layer1 Hidden Layer2 Hidden Layer3
Output Layer
BCELoss — PyTorch 1.7.0 documentation
72. National Agriculture and Food Research Organization
Implement XOR using PyTorch: Practice using Colab
73. National Agriculture and Food Research Organization
Recap: Forward & Back Propagation
𝑥1
𝑥2
𝑧1
𝑧2 ℎ2
ℎ1 𝑧3
𝑧4 𝑜2
𝑜1
Input Layer Hidden Layer Output Layer
𝑊1
𝑊2
𝑊3
𝑊4
𝑊5
𝑊6
𝑊7
𝑊8
Forward
Back Propagation
Loop:
1. Sample a batch of data
2. Forward propagation it through the graph, get loss
3. Back propagation to calculate the gradients
4. Update The parameters using the gradients
“Fully-connected” layers
winter1516_lecture4.pdf (stanford.edu)
74. National Agriculture and Food Research Organization
Neural Network Overview: A big problem
ML with Tensorflow (hunkim.github.io)
• Back propagation just did not work well for normal neural networks with many layer
• Other rising machine learning algorithms: SVM, RandomForest, etc.
• 1995 ”Comparison of Learning Algorithm For Handwritten Digit Recognition” by LeCun et.al
found that this new approach worked better
75. National Agriculture and Food Research Organization
Neural Network Overview: Breakthrough
• In 2006, Hinton, Simon Osindero and Yee-Whye Teh published, “A fast learning algorithm for deep belief nets”
• Yoshua Bengio et al. in 2007 with “Greedy Layer-Wise Training of Deep Networks”
Solutions
• Neural networks with many layers really could be trained well, if the weights are initialized in a clever way
rather than randomly.
• Deep machine learning methods are more efficient for difficult problems than shallow methods.
• Rebranding to Deep Nets, Deep Learning
ML with Tensorflow (hunkim.github.io)
76. National Agriculture and Food Research Organization
Neural Network Overview: Breakthrough
Geoffrey Hinton’s summary of findings up to today
• Our labeled datasets were thousands of times too small
• Our computers were millions of times too slow
• We initialized the weights in a stupid way
• We used the wrong type of non-linearity
A Brief History of Neural Nets and Deep Learning – Skynet Today
77. National Agriculture and Food Research Organization
Neural Network Overview: Better non-linearity
Problem of Sigmoid function
winter1516_lecture5.pdf (stanford.edu)
Active region Saturated region
Saturated region
→ Slow convergence problem
78. National Agriculture and Food Research Organization
Neural Network Overview: Better non-linearity
Back propagation (chain rule)
(×)
𝑧 = 𝑥𝑦
winter1516_lecture4.pdf (stanford.edu)
Vanishing Gradient (NN winter2: 1986 - 2006)
79. National Agriculture and Food Research Organization
Neural Network Overview: Better non-linearity
ReLU: Rectified Linear Unit
winter1516_lecture5.pdf (stanford.edu)
80. National Agriculture and Food Research Organization
Neural Network Overview: Better non-linearity
ReLU: Rectified Linear Unit
winter1516_lecture5.pdf (stanford.edu)
81. National Agriculture and Food Research Organization
Neural Network Overview: Better non-linearity
Why Sigmoid?. Ever since you have started to learn… | by Swain Subrat Kumar | núcleoML | Medium
General: ELU → Leaky ReLU → ReLU → tanh → sigmoid
CS231n: ReLU → Leaky ReLU or ELU
82. National Agriculture and Food Research Organization
Neural Network Overview: Practice using Colab
Activation Functions — ML Glossary documentation (ml-cheatsheet.readthedocs.io)
83. National Agriculture and Food Research Organization
Neural Network Overview: Practice using Colab
Activation Functions — ML Glossary documentation (ml-cheatsheet.readthedocs.io)
84. National Agriculture and Food Research Organization
Neural Network Overview: Practice using Colab
scikit-learn: machine learning in Python — scikit-learn 0.23.2 documentation (scikit-learn.org)
85. National Agriculture and Food Research Organization
Neural Network Overview: Practice using Colab
86. National Agriculture and Food Research Organization
Recap: Neural Network’s Breakthrough
Geoffrey Hinton’s summary of findings up to today
• Our labeled datasets were thousands of times too small
• Our computers were millions of times too slow
• We initialized the weights in a stupid way
• We used the wrong type of non-linearity
A Brief History of Neural Nets and Deep Learning – Skynet Today
87. National Agriculture and Food Research Organization
Recap: Better non-linearity
Back propagation (chain rule)
(×)
𝑧 = 𝑥𝑦
winter1516_lecture4.pdf (stanford.edu)
Vanishing Gradient (NN winter2: 1986 - 2006)
88. National Agriculture and Food Research Organization
Recap: Better non-linearity
Why Sigmoid?. Ever since you have started to learn… | by Swain Subrat Kumar | núcleoML | Medium
General: ELU → Leaky ReLU → ReLU → tanh → sigmoid
CS231n: ReLU → Leaky ReLU or ELU
89. National Agriculture and Food Research Organization
Recap: Neural Network’s Breakthrough
Geoffrey Hinton’s summary of findings up to today
• Our labeled datasets were thousands of times too small ✓
• Our computers were millions of times too slow ✓
• We initialized the weights in a stupid way → How to solve the problem??
• We used the wrong type of non-linearity✓
A Brief History of Neural Nets and Deep Learning – Skynet Today
90. National Agriculture and Food Research Organization
Weight initialization: Xavier Initialization
Input
Layer
Hidden
Layer-1
Hidden
Layer-2
Output
Layer
• Normal Initialization
𝑾~𝑵 𝟎, 𝝈 𝑾 𝟐
𝝈 𝑾 =
𝟐
𝒏𝒊𝒏𝒑𝒖𝒕 + 𝒏𝒐𝒖𝒕𝒑𝒖𝒕
where 𝑛𝑖𝑛𝑝𝑢𝑡 is number of the input layers,
𝑛𝑜𝑢𝑡𝑝𝑢𝑡 is number of the output layers.
• Uniform Initialization
𝑾~𝑼 −
𝟔
𝒏𝒊𝒏𝒑𝒖𝒕 + 𝒏𝒐𝒖𝒕𝒑𝒖𝒕
,
𝟔
𝒏𝒊𝒏𝒑𝒖𝒕 + 𝒏𝒐𝒖𝒕𝒑𝒖𝒕
※ここで、入力レイヤと出力レイヤの数がほぼほぼ同じであるばあいには、
𝝈 𝑾 = Τ
𝟏 𝒏𝒊𝒏𝒑𝒖𝒕, 𝑎 𝑜𝑟 𝑏 = Τ
𝟑 𝒏𝒊𝒏𝒑𝒖𝒕 を用いて初期化を行う。
Understanding the difficulty of training deep feedforward neural networks, http://proceedings.mlr.press/v9/glorot10a/glorot10a.pdf
92. National Agriculture and Food Research Organization
Weight initialization: Xavier Initialization
Xavier Initialization* = Glorot Initialization
Understanding the difficulty of training deep feedforward neural networks, http://proceedings.mlr.press/v9/glorot10a/glorot10a.pdf
Sigmoid, tanh などの活性化関数に利用する!!
93. National Agriculture and Food Research Organization
Weight initialization: Xavier Initialization
torch.nn.init — PyTorch 1.7.0 documentation
94. National Agriculture and Food Research Organization
Weight initialization: Xavier Initialization
torch.nn.init — PyTorch 1.7.0 documentation
95. National Agriculture and Food Research Organization
Weight initialization: He Initialization
Input
Layer
Hidden
Layer-1
Hidden
Layer-2
Output
Layer
• Normal Initialization
𝑾~𝑵 𝟎, 𝝈 𝑾 𝟐
𝝈 𝑾 =
𝟐
𝒏𝒊𝒏𝒑𝒖𝒕
where 𝑛𝑖𝑛𝑝𝑢𝑡 is number of the input layers,
• Uniform Initialization
𝑾~𝑼 −
𝟔
𝒏𝒊𝒏𝒑𝒖𝒕
,
𝟔
𝒏𝒊𝒏𝒑𝒖𝒕
Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification,
[1502.01852] Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet
Classification (arxiv.org)
97. National Agriculture and Food Research Organization
Weight initialization: He Initialization
ReLU系列の活性化関数に利用する!!
Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification,
[1502.01852] Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet
Classification (arxiv.org)
98. National Agriculture and Food Research Organization
Weight initialization: He Initialization
torch.nn.init — PyTorch 1.7.0 documentation
99. National Agriculture and Food Research Organization
Weight initialization: He Initialization
torch.nn.init — PyTorch 1.7.0 documentation
100. National Agriculture and Food Research Organization
Weight initialization: What’s different? (Xavier VS He)
𝑾~𝑵 𝟎, 𝝈 𝑾 𝟐
𝝈 𝑾 =
𝟐
𝒏𝒊𝒏𝒑𝒖𝒕
𝑾~𝑵 𝟎, 𝝈 𝑾 𝟐
𝝈 𝑾 =
𝟐
𝒏𝒊𝒏𝒑𝒖𝒕 + 𝒏𝒐𝒖𝒕𝒑𝒖𝒕
101. National Agriculture and Food Research Organization
Weight initialization: What’s different? (Xavier VS He)
102. National Agriculture and Food Research Organization
Weight initialization: What’s different? (Xavier VS He)
machine learning - Xavier and he_normal initialization difference - Stack Overflow
103. National Agriculture and Food Research Organization
Weight initialization: Proper initialization..
winter1516_lecture5.pdf (stanford.edu)
105. National Agriculture and Food Research Organization
Convolutional Neural Network (CNN)
Background
Local Receptive Field
http://cs231n.stanford.edu/slides/2016/winter1516_lecture7.pdf
106. National Agriculture and Food Research Organization
Convolutional Neural Network (CNN)
http://cs231n.stanford.edu/slides/2016/winter1516_lecture7.pdf
107. National Agriculture and Food Research Organization
Convolutional Neural Network (CNN)
Up: A regular 3-layer Neural Network. Down: A
ConvNet arranges its neurons in three
dimensions (width, height, depth), as visualized in
one of the layers. Every layer of a ConvNet
transforms the 3D input volume to a 3D output
volume of neuron activations. In this example, the
red input layer holds the image, so its width and
height would be the dimensions of the image, and
the depth would be 3 (Red, Green, Blue
channels).
https://cs231n.github.io/convolutional-networks/
108. National Agriculture and Food Research Organization
Convolutional Neural Network (CNN)
Raw Image
Filter
Fully connected layer Convolutional layer
VS.
https://yjjo.tistory.com/8
1 2 3
4 5 6
7 8 9
×
1 2
3 4
Raw Image
Filter
= Kernel
= Window
1
2
3
4
5
6
7
8
9
Flatten
=
37 47
67 77
Feature map
= Activation map
(1*1) + (2*2) + (4*3) + (5*4) = 37
Intuitively Understanding Convolutions for Deep
Learning | by Irhum Shafkat | Towards Data Science
Fully connected layer vs. Convolutional layer
Sliding Window
109. National Agriculture and Food Research Organization
Convolutional Neural Network (CNN)
Convolution vs. Cross-Correlation
𝑓 ∗ 𝑔 𝑡 = න
−∞
∞
𝑓 𝜏 𝑔 𝑡 − 𝜏 𝑑𝜏
Definition of convolution
https://yjjo.tistory.com/8
https://en.wikipedia.org/wiki/Cross-correlation
Convolution vs. Cross-Correlation – Glass Box (glassboxmedicine.com)
110. National Agriculture and Food Research Organization
Convolutional Neural Network (CNN)
2D Convolution: The Operation (Standard)
Intuitively Understanding Convolutions for Deep Learning | by Irhum Shafkat | Towards Data Science
Raw Image
= Input features
= 5 X 5 = 25
Output features
= 3 X 3 = 9
N
N
𝑶𝒖𝒕𝒑𝒖𝒕 𝒔𝒊𝒛𝒆 =
(𝑵 − 𝑭)
𝒔𝒕𝒓𝒊𝒅𝒆
+ 𝟏
Where, N = Input size, F = Filter size, stride
N = 5 X 5, F = 3 X 3, stride = 1
➞ Output features = (5 - 3) / 1 + 1 = 3
111. National Agriculture and Food Research Organization
Convolutional Neural Network (CNN)
2D Convolution: The Operation (padding = 1, stride = 2)
𝑶𝒖𝒕𝒑𝒖𝒕 𝒔𝒊𝒛𝒆 =
(𝑵 − 𝑭)
𝒔𝒕𝒓𝒊𝒅𝒆
+ 𝟏
Where, N = Input size, F = Filter size, stride
N = 7 X 7 (raw: 5 X 5, padding: 1), F = 3 X 3, stride = 2
➞ Output features = (7 - 3) / 2 + 1 = 3
https://github.com/vdumoulin/conv_arithmetic/blob/master/gif/padding_strides.gif
https://yjjo.tistory.com/8
112. National Agriculture and Food Research Organization
Convolutional Neural Network (CNN)
Multi-channel Convolution: The Operation
Intuitively Understanding Convolutions for Deep
Learning | by Irhum Shafkat | Towards Data Science
http://taewan.kim/post/cnn/
113. National Agriculture and Food Research Organization
Convolutional Neural Network (CNN)
https://cs231n.github.io/convolutional-networks/
114. National Agriculture and Food Research Organization
Convolutional Neural Network (CNN)
CNN 入出力パラメータ計算
Activation
function
SUM
各レイヤ別学習パラメータ数=
入力チャンネル数 X フィルタW X フィルタH X 出力チャンネル数
例1: Convolutional Layer1 = 1 X 4 X 4 X 20 = 320
例2: Fully connected layer = 160 X 100 = 160,000
http://taewan.kim/post/cnn/
Num. of
params
115. National Agriculture and Food Research Organization
FC Neural Network 入出力パラメータ計算
Convolutional Neural Network (CNN)
Layer Input Node Output Node Num. of parameters
SUM
各レイヤ別学習パラメータ数=
入力チャンネル数 X フィルタW X フィルタH X 出力チャンネル数
http://taewan.kim/post/cnn/
CNN vs. FC neural network(パラメータ数)
CNN(208,320) , FC(1,055,400)でCNNが少ないパラメータで構成される。
116. National Agriculture and Food Research Organization
Convolutional Neural Network (CNN): Practice using Colab
LeNet (LeNet-5 using MNIST dataset)
Layer Input
Channel
Filter Output
Channel
Stride Pooling Activation
Function
Input
shape
Output
shape
Num of
parameters
Conv1 1 (5, 5) 6 1 X ReLU (28, 28, 1) (28, 28, 6) 150 + 6
Avg Pooling1 6 X 6 2 (2, 2) X (28, 28, 6) (14, 14, 6) 0
Conv2 6 (5, 5) 16 1 X ReLU (14, 14, 6) (10, 10, 16) 2400 + 16
Avg Pooling2 16 X 16 2 (2, 2) X (10, 10, 16) (5, 5, 16) 0
FC1 X X X X X ReLU (5X5X16, 1) (120, 1) 30720 + 120
FC2 X X X X X ReLU (120, 1) (84, 1) 10080 + 84
FC3 X X X X X ReLU (84, 1) (10, 1) 840 + 10 SUM: 44,426
flatten
LeCun, Yann, et al. "Gradient-based learning applied to document recognition." Proceedings of the IEEE 86.11 (1998): 2278-2324.
Paper: 32 X 32
Paper: 32 X 32
117. National Agriculture and Food Research Organization
Convolutional Neural Network (CNN): Practice using Colab
https://pytorch.org/docs/stable/torchvision/datasets.html#mnist
MNIST dataset
https://en.wikipedia.org/wiki/MNIST_database
• The MNIST database (Modified National Institute of Standards and
Technology database) is a large database of handwritten digits that is
commonly used for training various image processing systems.
• the black and white images from NIST were normalized to fit into a 28x28
pixel bounding box and anti-aliased, which introduced grayscale levels.
• The MNIST database contains 60,000 training images and 10,000 testing
images.
https://en.wikipedia.org/wiki/MNIST_database
118. National Agriculture and Food Research Organization
Convolutional Neural Network (CNN): Practice using Colab
mnist mean: 0.13066047430038452
mnist_std: 0.30810779333114624
Data processing (MNIST)
119. National Agriculture and Food Research Organization
Data processing (MNIST)
Convolutional Neural Network (CNN): Practice using Colab
120. National Agriculture and Food Research Organization
Convolutional Neural Network (CNN): Practice using Colab
Defining the model(LeNet)
SUM: 44,426
121. National Agriculture and Food Research Organization
Convolutional Neural Network (CNN): Practice using Colab
Training the model(LeNet)
122. National Agriculture and Food Research Organization
Convolutional Neural Network (CNN): Practice using Colab
Test the model(LeNet)
123. National Agriculture and Food Research Organization
Convolutional Neural Network (CNN): Practice using Colab
Test the model(LeNet)
124. National Agriculture and Food Research Organization
Convolutional Neural Network (CNN)
Top 10 CNN Architectures Every Machine Learning Engineer Should Know, Top 10 CNN Architectures
Every Machine Learning Engineer Should Know | by Trung Anh Dang | Jan, 2021 | Towards Data Science
http://cs231n.stanford.edu/slides/2020/lecture_9.pdf
125. National Agriculture and Food Research Organization
Recap: Convolutional Neural Network
各レイヤ別学習パラメータ数=
入力チャンネル数 X フィルタW X フィルタH X 出力チャンネル数
𝑶𝒖𝒕𝒑𝒖𝒕 𝒔𝒊𝒛𝒆 =
(𝑵 − 𝑭)
𝒔𝒕𝒓𝒊𝒅𝒆
+ 𝟏
Where, N = Input size, F = Filter size, stride
N = 7 X 7 (raw: 5 X 5, padding: 1), F = 3 X 3, stride = 2
➞ Output features = (7 - 3) / 2 + 1 = 3
https://github.com/vdumoulin/conv_arithmetic/blob/master/gif/padding_strides.gif
LeCun, Yann, et al. "Gradient-based learning applied to document recognition." Proceedings of the IEEE 86.11 (1998): 2278-2324.
Params: 44,426
LeNet-5
126. National Agriculture and Food Research Organization
CNN: AlexNet
ImageNet Classification with Deep Convolutional Neural Networks
http://cs231n.stanford.edu/slides/2020/lecture_9.pdf
https://papers.nips.cc/paper/2012/file/c399862d3b9d6b76c8436e924a68c45b-Paper.pdf
Main Contributions
• ReLU
• LRN
• multi-gpu
• overlapping pooling
• reduce overfitting
➢ data augmentation
➢ dropout
127. National Agriculture and Food Research Organization
CNN: AlexNet
https://starpentagon.net/analytics/imagenet_ilsvrc2012_dataset/
https://starpentagon.net/analytics/imagenet_ilsvrc2012_dataset/
128. National Agriculture and Food Research Organization
CNN: AlexNet
https://papers.nips.cc/paper/2012/file/c399862d3b9d6b76c8436e924a68c45b-Paper.pdf
130. National Agriculture and Food Research Organization
CNN: AlexNet
Main Contributions
• ReLU
• LRN
• multi-gpu
• overlapping pooling
• reduce overfitting
➢ data augmentation
➢ dropout
131. National Agriculture and Food Research Organization
CNN: AlexNet
Main Contributions
• ReLU
• LRN
• multi-gpu
• overlapping pooling
• reduce overfitting
➢ data augmentation
➢ dropout
Hermann grid
最近、CNNで利用する主流のNormalization手法はLRNではなく、
Batch Normalizationである。
later inhibition
132. National Agriculture and Food Research Organization
CNN: AlexNet
Main Contributions
• ReLU
• LRN
• multi-gpu
• overlapping pooling
• reduce overfitting
➢ data augmentation
➢ dropout
https://blueskyvision.tistory.com/421
133. National Agriculture and Food Research Organization
CNN: AlexNet
https://blueskyvision.tistory.com/421
Main Contributions
• ReLU
• LRN
• multi-gpu
• overlapping pooling
• reduce overfitting
➢ data augmentation
➢ dropout
134. National Agriculture and Food Research Organization
https://cs231n.github.io/neural-networks-2/
CNN: AlexNet
www.cs.toronto.edu/~rsalakhu/papers/srivastava14a.pdf
Main Contributions
• ReLU
• LRN
• multi-gpu
• overlapping pooling
• reduce overfitting
➢ data augmentation
➢ dropout
https://pytorch.org/docs/stable/generated/torch.nn.Dropout.html
135. National Agriculture and Food Research Organization
AlexNet: Practice using Colab
Web site: https://en.d2l.ai/index.html
Pdf download: https://d2l.ai/d2l-en.pdf
136. National Agriculture and Food Research Organization
AlexNet: Practice using Colab
Data Processing: Load fashion MNIST dataset
https://www.kaggle.com/zalando-research/fashionmnist
137. National Agriculture and Food Research Organization
AlexNet: Practice using Colab
https://colab.research.google.com/github/d2l-ai/d2l-pytorch-colab/blob/master/chapter_convolutional-modern/alexnet.ipynb
https://en.d2l.ai/chapter_convolutional-modern/alexnet.html
256
Params: 44,426
Params: 62.3 million
138. National Agriculture and Food Research Organization
AlexNet: Practice using Colab
Training
https://github.com/d2l-ai/d2l-en/blob/master/chapter_convolutional-neural-networks/lenet.md
Batch size vs epochs
https://jerryan.medium.com/batch-size-a15958708a6
139. National Agriculture and Food Research Organization
History of Convolutional Neural Network (CNN)
Top 10 CNN Architectures Every Machine Learning Engineer Should Know, Top 10 CNN Architectures
Every Machine Learning Engineer Should Know | by Trung Anh Dang | Jan, 2021 | Towards Data Science
http://cs231n.stanford.edu/slides/2020/lecture_9.pdf
140. National Agriculture and Food Research Organization
Top 10 CNN Architectures Every Machine Learning Engineer Should Know, Top 10 CNN Architectures
Every Machine Learning Engineer Should Know | by Trung Anh Dang | Jan, 2021 | Towards Data Science
http://cs231n.stanford.edu/slides/2020/lecture_9.pdf
VGG(Visual Geometry Group)Net
141. National Agriculture and Food Research Organization
VGG(Visual Geometry Group)Net
https://arxiv.org/pdf/1409.1556.pdf%20http://arxiv.org/abs/1409.1556.pdf
貢献
1. 16-19レイヤで構成したアーキテクチャの
学習のための手法
2. 全Convolutional layerで3X3フィルタを適用
142. National Agriculture and Food Research Organization
VGG(Visual Geometry Group)Net
https://neurohive.io/en/popular-networks/vgg16/
143. National Agriculture and Food Research Organization
VGG(Visual Geometry Group)Net
https://medium.com/@msmapark2/vgg16-
%EB%85%BC%EB%AC%B8-%EB%A6%AC%EB%B7%B0-
very-deep-convolutional-networks-for-large-scale-image-
recognition-6f748235242a
With BatchNorm
144. National Agriculture and Food Research Organization
VGG(Visual Geometry Group)Net
7X7 VS. 3X3 Filter
𝑶𝒖𝒕𝒑𝒖𝒕 𝒔𝒊𝒛𝒆 =
(𝑵 − 𝑭)
𝒔𝒕𝒓𝒊𝒅𝒆
+ 𝟏
各レイヤ別学習パラメータ数=
入力チャンネル数 X フィルタW X フィルタH X 出力チャンネル数
10X10
4X4
10X10
8X8
6X6
4X4
VGG16 논문 리뷰 — Very Deep Convolutional Networks for Large-Scale Image Recognition | by 강준영 | Medium
145. National Agriculture and Food Research Organization
VGG(Visual Geometry Group)Net: Practice using Colab
Data processing (102 Category Flower Dataset)
https://www.robots.ox.ac.uk/~vgg/data/flowers/102/index.html
https://www.robots.ox.ac.uk/~vgg/data/flowers/102/categories.html
146. National Agriculture and Food Research Organization
VGG(Visual Geometry Group)Net: Practice using Colab
Data processing (102 Category Flower Dataset)
https://www.robots.ox.ac.uk/~vgg/data/flowers/102/index.html
147. National Agriculture and Food Research Organization
VGG(Visual Geometry Group)Net: Practice using Colab
Data processing (102 Category Flower Dataset)
https://www.robots.ox.ac.uk/~vgg/data/flowers/102/index.html
148. National Agriculture and Food Research Organization
VGG(Visual Geometry Group)Net: Practice using Colab
Data processing (102 Category Flower Dataset)
https://www.robots.ox.ac.uk/~vgg/data/flowers/102/index.html
149. National Agriculture and Food Research Organization
VGG(Visual Geometry Group)Net: Practice using Colab
150. National Agriculture and Food Research Organization
VGG(Visual Geometry Group)Net: Practice using Colab
151. National Agriculture and Food Research Organization
VGG(Visual Geometry Group)Net: Practice using Colab
152. National Agriculture and Food Research Organization
VGG(Visual Geometry Group)Net: Practice using Colab
153. National Agriculture and Food Research Organization
VGG(Visual Geometry Group)Net: Practice using Colab
154. National Agriculture and Food Research Organization
Top 10 CNN Architectures Every Machine Learning Engineer Should Know, Top 10 CNN Architectures
Every Machine Learning Engineer Should Know | by Trung Anh Dang | Jan, 2021 | Towards Data Science
http://cs231n.stanford.edu/slides/2020/lecture_9.pdf
GoogLeNet / InceptionNet
155. National Agriculture and Food Research Organization
GoogLeNet / InceptionNet
https://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Szegedy_Going_Deeper_With_2015_CVPR_paper.pdf
156. National Agriculture and Food Research Organization
GoogLeNet / InceptionNet
Network in Network 非線形関係を表現するために1X1 Convを導入
Cross Channel Parametric Pooling (CCCP)
1X1 Conv. のメリット
1. 非線形関係を表現可能 (Representation of non-linear)
2. 次元縮小 (Dimension reduction)
https://arxiv.org/pdf/1312.4400.pdf%20http://arxiv.org/abs/1312.4400.pdf https://kangbk0120.github.io/articles/2018-01/inception-googlenet-review
157. National Agriculture and Food Research Organization
GoogLeNet / InceptionNet
Architectural Details
GoogLeNetで利用したInception Module
1X1 Conv. のメリット
1. 非線形関係を表現可能 (Representation of non-linear) + ReLU
2. 次元縮小 (Dimension reduction)
158. National Agriculture and Food Research Organization
GoogLeNet / InceptionNet
Architectural Details
https://deep-learning-study.tistory.com/389
159. National Agriculture and Food Research Organization
GoogLeNet / InceptionNet
Architectural Details
Convolution
Pooling
Softmax
Concatenation / Nomalize
160. National Agriculture and Food Research Organization
GoogLeNet / InceptionNet: Practice using Colab
Data processing (102 Category Flower Dataset)
https://www.robots.ox.ac.uk/~vgg/data/flowers/102/index.html
https://www.robots.ox.ac.uk/~vgg/data/flowers/102/categories.html
VGG16の例と同じデータセットを利用しています。
161. National Agriculture and Food Research Organization
GoogLeNet / InceptionNet: Practice using Colab
GoogLeNet Code Review (Inception module)
https://github.com/pytorch/vision/blob/master/torchvision/models/googlenet.py
162. National Agriculture and Food Research Organization
GoogLeNet / InceptionNet: Practice using Colab
GoogLeNet Code Review (Auxiliary classifier)
https://github.com/pytorch/vision/blob/master/torchvision/models/googlenet.py
163. National Agriculture and Food Research Organization
GoogLeNet / InceptionNet: Practice using Colab
GoogLeNet Code Review (Architecture)
164. National Agriculture and Food Research Organization
GoogLeNet / InceptionNet: Practice using Colab
ImageNetは1000クラス区分→Flower Datasetは102クラス区分が必要
165. National Agriculture and Food Research Organization
GoogLeNet / InceptionNet: Practice using Colab
166. National Agriculture and Food Research Organization
Top 10 CNN Architectures Every Machine Learning Engineer Should Know, Top 10 CNN Architectures
Every Machine Learning Engineer Should Know | by Trung Anh Dang | Jan, 2021 | Towards Data Science
http://cs231n.stanford.edu/slides/2020/lecture_9.pdf
ResNet
He Initialization
167. National Agriculture and Food Research Organization
ResNet
https://openaccess.thecvf.com/content_cvpr_2016/papers/He_Deep_Residual_Learning_CVPR_2016_paper.pdf
168. National Agriculture and Food Research Organization
ResNet
https://axa.biopapyrus.jp/deep-learning/cnn/image-classification/resnet.html
Shortcut connection = Skip connection
169. National Agriculture and Food Research Organization
ResNet
https://github.com/pytorch/vision/blob/master/torchvision/models/resnet.py
170. National Agriculture and Food Research Organization
ResNet
https://github.com/pytorch/vision/blob/master/torchvision/models/resnet.py
173. National Agriculture and Food Research Organization
ResNet: Practice using Colab
Data processing (102 Category Flower Dataset)
https://www.robots.ox.ac.uk/~vgg/data/flowers/102/index.html
https://www.robots.ox.ac.uk/~vgg/data/flowers/102/categories.html
VGG16とGoogLeNetの例と同じデータセットを利用しています。
174. National Agriculture and Food Research Organization
ResNet: Practice using Colab
https://www.programmersought.com/article/68543552068/
https://github.com/pytorch/vision/blob/master/torchvision/models/resnet.py
176. National Agriculture and Food Research Organization
ResNet: Practice using Colab
https://github.com/pytorch/vision/blob/master/torchvision/models/resnet.py
177. National Agriculture and Food Research Organization
ResNet: Practice using Colab
https://github.com/pytorch/vision/blob/master/torchvision/models/resnet.py