ROS User Group Meeting #28 マルチ深層学習とROS

中原啓貴
東京⼯業⼤学
マルチ深層学習タスクとROSによる
⾼機能なロボットを⽬指して

⾃⼰紹介
• なかはらひろき (Hiroki Nakahara)
https://github.com/HirokiNakahara/
• 速いコンピュータを作りたい
• 多値論理: Ternary, RNS
• 計算機アーキテクチャ: FPGA, ASIC
• 深層学習
• ツールとか: GUINNESS (GUI based neural network synthesizer)
2

GUINNESS ツールフロー
.model
Binarized
CNN Weight
Chainer
to
C++
Model
to
Text
Binarized
Weight
.txt
PL code
.cpp
PS code
.cpp gcc
HLS
.elf
.bit
.pkl
Label Data
.txt
CNN Spec.
.py
Image Data
PS
PL
Exe. data
Bit stream
BRAM
FPGA
GUIによる
操作
ユーザの
学習⽤画像
GPU上で学習
FPGAベンダの
システム設計ツールで
ビットストリーム⽣成
https://github.com/HirokiNakahara/GUINNESS

Deep Learning is Everywhere
4

Applications
• Robotics, autonomous driving, security, drones…
5

Object Detection
6
Person
J. Redmon and A. Farhadi, "YOLOv3: An Incremental Improvement," arXiv, 2018
Person
Boat

Semantic Segmentation
7E. Shelhamer, J. Long and T. Darrell, "Fully Convolutional Networks for Semantic Segmentation," IEEE Trans. on
Pattern Analysis and Machine Intelligence, Vol.39, No.4, 2017, pp. 640 ‐ 651.

Pose Estimation
8
Z. Cao, T. Simon, S.‐E. Wei and Y. Sheikh, " Realtime Multi‐Person 2D Pose Estimation
using Part Affinity Fields," CVPR, 2017.

Depth Map Prediction
9
D. Eigen, C. Puhrsch and R. Fergus, "Depth Map Prediction from a Single Image using a
Multi‐Scale Deep Network," arXiv:1406.2283 , 2014.

Demo: Low‐end FPGA
10
Xilinx Inc. Zynq UltraScale+ MPSoC (ZU3EG)
30FPS (YOLOv2)

Demo: High‐end FPGA
11
Intel Corp. Arria 10 GX FPGA (10AX115N2F45E1SG),
166FPS (YOLOv2)

Demo
Segmentation (AlexNet‐based FCN), 50 FPS
12

まじめな話
• 超⾼齢者社会に向けて
• Home Support Robot (HSR)
13

本⾳
• ロボットを作りたい︕
14

Convolutional Neural Network (CNN)
• Convolutional + Fully connected + Pooling
• State‐of‐the‐art performance in an image
recognition task
• Widely applicable
15Source: https://www.mathworks.com/discovery/convolutional‐neural‐network.html

Deep Learning Inference Device
16
Flexibility
Power Performance
Efficiency
CPU
(Raspberry Pi3)
GPU
(Jetson TX2)
FPGA
(UltraZed)
ASIC
(Movidius)
• Flexibility: R&D costs for keeping on evolving
algorithms
• Power performance efficiency
• FPGA has flexibility&better performance

Field Programmable Gate Array (FPGA)
• Millions of Look‐Up Tables (LUTs)
• Thousands of on‐chip memories and DSP blocks
• Programmable channels
• Dedicated IP macros (PCIe,DDR,MPU)
17Source: Intel Corp. Arria10 FPGA Device Family Overview

Improvements by Binarization
18
x1
w0 (Bias)
fsgn(Y)
Y
z
w1
x2
w2
xn
wn
...
x1 x2 Y
‐1 ‐1 1
‐1 +1 ‐1
+1 ‐1 ‐1
+1 +1 1
x1 x2 Y
0 0 1
0 1 0
1 0 0
1 1 1
EXNORs → Many MACs
Binary Precision → On‐chip Memory

Near Memory Realization by Binarization
E. Joel et al., “Tutorial on Hardware Architectures
for Deep Neural Networks,” MICRO‐49, 2016. 19
On-chip
Memory
J. Dean, “Numbers everyone should know”
Source: https://gist.github.com/2841832
• High bandwidth (Left)
• Less power consumption (Right)

PYNQ + ⾼位合成 + ROS
http://www.pynq.io/
• C++で記述, HLSでIPコア(ROSノード)化
• Ubuntu 上のPythonでSW設計
→ROS Kinetic を利⽤
20
‐‐‐‐‐‐‐
‐‐‐‐‐‐‐
‐‐‐‐‐‐‐
‐‐‐‐‐‐‐
‐‐‐‐‐‐‐
‐‐‐‐‐‐‐
http://www.pynq.io/

プロトタイプ
21
ホストPC
(Ubuntu16.04+ROS Kinetic)
FPGA (ZCU104 Board)
ホストPCのカメラ
アクチュエータ
Roomba
約8万円
ミドルクラス
FPGA搭載
(Zynq Ultra Scale+)
Roomba以外の
電源(omnicharge)
ホストPCとFPGAはEthernetで接続

22
⼈をストーカー
追跡する
ロボット
→
警備員
ロボット

Multiple DL Tasks
• Taskonomy [Zamir, CVPR2018 (best paper)]
• http://taskonomy.stanford.edu/
• 26種類のタスクを同時に学習・実⾏
• ロボットでやりたい︕
23

まとめ
• FPGAを使うと⾼性能なDLをROSで扱える(といいな)
• オンチップメモリ利⽤
• Pynqフレームワーク
• 設計難易度が⾼いのがネック
→フレームワークを開発中
• マルチ深層学習タスク
• 複数画像処理＋⾳声処理も可能
24

ROS User Group Meeting #28 マルチ深層学習とROS

More Related Content

What's hot

Similar to ROS User Group Meeting #28 マルチ深層学習とROS

More from Hiroki Nakahara

Recently uploaded

ROS User Group Meeting #28 マルチ深層学習とROS