17. 混合精度CNN
• Object Detectorなど複雑なタスクで必須技術
• 前段: 2値精度CNN … ⾯積・スピード
• 後段: 多値精度CNN … 回帰問題(枠推定)
17
Input
Image
(Frame)
Feature maps
CONV+Pooling
CNN
CONV+Pooling
Class score
Bounding Box
Detection
2値 half
H. Nakahara et al., “A Lightweight YOLOv2: A Binarized CNN with A Parallel Support Vector Regression for an
FPGA,” Int’l Symp. on FPGA (ISFPGA), 2018.
18. 蒸留 (Distillation)
• 学習済みモデルを別のモデルに転移
• 異なるモデル(層・チャネルなど)に転移する技術
• 蒸留による学習: 教師モデルのスコアを全て伝搬
→スコアの分布に汎⽤的な知識が含まれている
18G. Hinton, Oriol Vinyals, and J. Dean, “Distilling the Knowledge in a Neural Network,”NIPS’04
Teacher (Trained) CNN
Student CNN
Car 0.82
Cat 0.08
Dog 0.07
Pet 0.03
Car 0.62
Cat 0.12
Dog 0.24
Pet 0.02
Car 1.00
Cat 0.00
Dog 0.00
Pet 0.00
Soft target loss
Hard
target
loss Training
Dataset
Loss for soft and
hard targets
21. Google Colaboratory
• 12時間までGPU (Tesla K80)を使える
• 必要なライブラリは予め導⼊済み
• TensorFlowも利⽤可能
• Chainerの導⼊も可能
Chainer on Google Colaboratory:
https://github.com/chainer/google‐colaboratory
• GUINNESS(Binary Neural Network設計ツール)を
Colaboratoryで使う⽅法が︕
Google ColaboratoryでBinary CNNを動かす(MNIST):
http://shimaharu.blogspot.com/2018/11/google‐colaboratorybinary‐cnnmnist.html
• Vivadoを⼊れてみた⼈も…
• スマフォで全部できるじゃん︕
21
Chainerを
デフォルトでサポート
(2019/Jan/31に確認)
22. On‐going work
• Coca‐cola DLを開発開始
• Co‐design and verification on Colaboratory for Deep Learning
22
https://github.com/HirokiNakahara/Coca‐Cola‐DL/
→
53. 既存の実装結果との⽐較
Implementation
(Year)
Zhao et al.
(2017) [1]
FINN
(2017) [2]
Boucle et al.
(2017) [3]
Ours
(2019)
CNN Binary Binary Ternary Noise
Clock (MHz) 143 166 250 199
#LUTs
#18Kb BRAMs
#DSP 48Es
46900
94
3
42823
270
32
67300
667
0
40911
228
192
Accuracy (%) 87.73% 80.10% 86.71% 92.35%
Time [msec]
(FPS [s‐1])
5.94
(168)
2.24
(445)
2.36
(423)
1.80
(557)
Power 4.7 2.5 6.8 3.5
53
Binary, Ternaryよりも⾼速かつ⾼認識精度, ただしDSPブロック必要
VGG9をベースにしたCNNで評価, データセットはCIFAR10
[1] R. Zhao, W. Song, W. Zhang, T. Xing, J.‐H. Lin, M. Srivastava, R. Gupta and Z. Zhang, “Accelerating
Binarized Convolutional Neural Networks with Software‐Programmable FPGAs,” ISFPGA, 2017, pp.15‐24.
[2] Y. Umuroglu, N. J. Fraser, G. Gambardella, M. Blott, P. Leong, M. Jahre, and K. Vissers,
“FINN: A Framework for Fast, Scalable Binarized Neural Network Inference,” ISFPGA, 2017.
[3] A. P‐. Boucle, A. Bourge, F. Ptrot, H. Alemdar, N. Caldwell, and V. Leroy, “Scalable high‐performance
architecture for convolutional ternary neural networks on FPGA,” FPL, 2017, pp.1–7.