1. PEAR LAB Utsunomiya Univ.
ARCHITECTURE EXPLORATION OF
INTELLIGENT ROBOT SYSTEM USING
ROS-COMPLIANT FPGA COMPONENT
Takeshi Ohkawa, Kazushi Yamashina, Takuya
Matsumoto, Kanemitsu Ootsu, Takashi Yokota
Utsunomiya University, Japan
2016/10/6 RSP2016@ESWEEK, Pittsburgh 1
2. PEAR LAB Utsunomiya Univ.
•Background
•ROS-Compliant FPGA Component
•Proposal
• Architecture Exploration Method for Intelligent Robot System
by using ROS-Compliant FPGA Component
•Case Study: Visual SLAM
• Distributed Processing of Visual SLAM between robot and cloud
• Functional Partitioning
• Architecture Exploration at Model level
•Conclusion and future work
Outline
2016/10/6 RSP2016@ESWEEK, Pittsburgh 2
3. PEAR LAB Utsunomiya Univ.
•Requirements for Autonomous Mobile Robots
• Processing: High-performance, ex) Image recognition, SLAM
• Mobile: Low Power due to battery operation
•Expectation: introduction of FPGA into robots
• Power efficiency: high-performance processing at low power
• Problem: difficult development of FPGA
•Robot engineering = integration of components
• Necessity of reducing the cost for introducing FPGA
•Our Solution:
• ROS (Robot Operating System) Compliant FPGA Component
Background:
Requirements for Robot development
2016/10/6 RSP2016@ESWEEK, Pittsburgh 3
4. PEAR LAB Utsunomiya Univ.
• ROS is a component-based application framework for
robotic software and build tools.
• ROS is not an OS. Does not guarantee realtimeness.
• ROS runs on Linux (Ubuntu).
• Abundant component library – productivity!
• Communication model in ROS: Publish/Subscribe
• Easy for develop, modify, test and maintenance (Cf. Client/Server)
ROS (Robot Operating System)
2016/10/6 RSP2016@ESWEEK, Pittsburgh 4
Node
Publication Subscription
Subscriber
Publisher
Topic
Service invocation
msg
Massage (data)
Node Node
5. PEAR LAB Utsunomiya Univ.
ROS-Component structure
for introducing FPGA into ROS system
ROS-compliant FPGA component [2]
2016/10/6 RSP2016@ESWEEK, Pittsburgh 5
[2] Kazushi Yamashina, Takeshi Ohkawa, Kanemitsu Ootsu and Takashi Yokota : “Proposal of ROS-compliant
FPGA Component for Low- Power Robotic Systems - case study on image processing application -”,
Proceedings of 2nd International Workshop on FPGAs for Software Programmers, FSP2015, pp. 62-67, 2015.
Interface
for input
FPGA
Interface
for output
Communication
ROS compliant component
Topic Topic
Subscribe Publish
6. PEAR LAB Utsunomiya Univ.
•Labeling
ROS compliant FPGA component:
Example of image processing
2016/10/6 RSP2016@ESWEEK, Pittsburgh 6
Labeling
Measured processing time of labeling
26x faster
than SW
0.032
0.835
0.075
0.0
0.2
0.4
0.6
0.8
1.0
FPGA+ARM…SW only (ARM) SW only (PC)
Time(sec)
[2] Kazushi Yamashina, Takeshi Ohkawa, Kanemitsu Ootsu and Takashi Yokota : “Proposal of ROS-compliant
FPGA Component for Low- Power Robotic Systems - case study on image processing application -”,
Proceedings of 2nd International Workshop on FPGAs for Software Programmers, FSP2015, pp. 62-67, 2015.
7. PEAR LAB Utsunomiya Univ.
Total latency of the ROS compliant
FPGA component
2016/10/6 RSP2016@ESWEEK, Pittsburgh 7
Speed UP 1.7x
Resolution:1920x1080
Zedboard (Zynq-7020)
ARM(PS): Cortex-A9 666MHz
FPGA(PL): 100MHz
PC: Core i7 870 2.93GHz
0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5
FPGA+ARM…
SW only (ARM)…
SW only (PC)…
time (s)
1 : Communication of ROS nodes (Publish/Subscribe)
2 : From after subscribe to before labeling
3 : Processing of labeling
4 : From after labeling to before publish
5 : Communication of ROS nodes
[2] Kazushi Yamashina, Takeshi Ohkawa, Kanemitsu Ootsu and Takashi Yokota : “Proposal of ROS-compliant
FPGA Component for Low- Power Robotic Systems - case study on image processing application -”,
Proceedings of 2nd International Workshop on FPGAs for Software Programmers, FSP2015, pp. 62-67, 2015.
8. PEAR LAB Utsunomiya Univ.
cReComp*: automated component generator [3]
2016/10/6 RSP2016@ESWEEK, Pittsburgh 8
• Input
• User Logic: Verilog HDL
• Configuration file:
• scrp
(specification for cReComp )
• Python**
(build AST)
• Output
• HDL: Control of FIFO
• C++: ROS node
• Target: Xilinx Zynq
HW I/FSW I/F
ROS node
*.v
ROS-Compliant FPGA component
*.cpp
Data comm.
Hardware
config
User
logic
generat
e
Software
cReComp
FIFO
FIFO
*.scrp or *.py
*.v
*cReComp: creator for Reconfigurable Component
**The python interface is developed using PyVerilog by S. Takamaeda https://github.com/PyHDI/Pyverilog
[3] Kazushi Yamashina, Takeshi Ohkawa, Kanemitsu Ootsu, Takashi Yokota,
“cReComp: Automated Design Tool for ROS-Compliant FPGA Component”, IEEE 10th International Symposium
on Embedded Multicore/Many-core Systems-on-Chip (MCSoC-16), Sep. 22, 2016
https://github.com/kazuyamashi/cReComp
9. PEAR LAB Utsunomiya Univ.
•FPGA and Software can be replaced easily at runtime
•Dataflow model can directly mapped into runtime!
• Rapid System Prototyping
HW/SW architecture exploration using
ROS-compliant FPGA component
2016/10/6 RSP2016@ESWEEK, Pittsburgh 9
(a) ROS system with pure software (SW) (b) ROS system with SW/FPGA hybrid
SW
Node
A
/topic1
SW
Node
B
/topic2
/topic3
SW
Node
D
FPGA
Node
C /topic4
SW
Node
A
/topic1
SW
Node
B
/topic2
/topic3
SW
Node
D
SW
Node
C /topic4
latency
11. PEAR LAB Utsunomiya Univ.
•Background
•ROS-Compliant FPGA Component
•Proposal
• Architecture Exploration Method for Intelligent Robot System
by using ROS-Compliant FPGA Component
•Case Study: Visual SLAM
• Distributed Processing of Visual SLAM between robot and cloud
• Functional Partitioning
• Architecture Exploration at Model level
•Conclusion and future work
Outline
2016/10/6 RSP2016@ESWEEK, Pittsburgh 11
12. PEAR LAB Utsunomiya Univ.
• Input: sensor values of distance (Laser Range Finder and so on)
• Output: Map around the robot and the self location in the map
• Trade-off: precision of the map, processing time
• Robot cannot equip with High-performance processor due to power problem
SLAM (Simultaneous Localization and mapping)
2016/10/6 RSP2016@ESWEEK, Pittsburgh 12
Feedback each other
Self
Localization
Mapping
input
External sensor
・Camera
・LRF(depth)
Internal sensor
・Gyro
output
Image data
Depth data
Control value
Localization
Mapping
[1]
Robot
13. PEAR LAB Utsunomiya Univ.
•Briefly: SLAM using only image sensor
•See below: Processing flow example of Visual SLAM
• SLAM after Image processing
•Problem: Large amount of processing
Visual SLAM
2016/10/6 RSP2016@ESWEEK, Pittsburgh 13
Feature
Description
Image
Input
Feature
Pursing
Feature
Matching
Self
Localization
Mapping
Feature
Extraction
Image processing SLAM
14. PEAR LAB Utsunomiya Univ.
• Partitioning SLAM processing at a certain point, transferring
data to cloud from robot
• Challenge: to reduce processing amount at robot side,
explore the way of off-load processing into cloud servers
Basic concept of distributed SLAM processing
2016/10/6 RSP2016@ESWEEK, Pittsburgh 14
Image sensor
Processing at Cloud-side
Processing at Robot-side
Input
Image
Processing flow
SLAM front-end
SLAM core
Data transfer to cloud-side server
Self Location
And Map
Feature
Previous
Frames
Update of
location
And Map
Parallel Processing
On Servers
Data Volume
Processing Volume
SLAM backend
15. PEAR LAB Utsunomiya Univ.
•Target implementation of SLAM: RTAB-Map[16]
•RTAB-Map(Real-Time Appearance-Based Mapping)
• Input: RGB-D camera
• Winner in IROS 2014 Kinect Robot Navigation Contest *
Architecture exploration by partitioning
and distributed processing of SLAM
2016/10/6 RSP2016@ESWEEK, Pittsburgh 15
3D map example (our lab) by RTAB-Map RGB-D camera (Kinect)
Pose graph
Point cloud
[16] RTAB-Map < http://introlab.github.io/rtabmap/ (Access: 2015/12/14)
*Winning the IROS2014 Microsoft Connect Challenge - SV ROS (San Jose, CA),
http://www.meetup.com/ja/SV-ROS-users/pages/ (Access: 2015/10/28)
16. PEAR LAB Utsunomiya Univ.
Bandwidth of dataflow and task load
2016/10/6 RSP2016@ESWEEK, Pittsburgh 16
Node
(Time)
Topic
Detaflow
OS Ubuntu14.04 LTS
CPU Intel Core i7-4712MQ (Max. 3.3GHz)
Memory 8GB
ROS version Indigo
RTAB-Map 0.10.11
RGB-D sensor Kinect for Windows (Microsoft corporation)
RGB img 640×480, Depth img 320×240, 30fps
Experimental environment (PC)
Map
Construction
Map
Visuali-
zation
Camera
30fps
rtabmapcamera
Depth img
RGB img
Odom.
Map data
Odom.
For visualize
Stat. info(67.45ms) (290.45ms)
Self
Localization
27.8MB/s
18.5MB/s
9.3KB/s
378KB/s
208KB/s
2.22KB/s
Processing time per a frame (avg. of 100 sec x 3)
Bandwidth
17. PEAR LAB Utsunomiya Univ.
• Each candidate needs to transfer image data
• More than 370Mbps (27.8 + 18.5 = 46.3 MB/s)
• Does not satisfy the requirement of bandwidth
for mobile/wireless autonomous robots
3 candidates of partitioning
2016/10/6 RSP2016@ESWEEK, Pittsburgh 17
Map
Construction
Map
Visuali-
zation
Camera
30fps
rtabmapcamera
Depth img
RGB img
Odom.
Map data
Odom.
For visualize
Stat. info(67.45ms) (290.45ms)
Self
Localization
27.8MB/s
18.5MB/s
9.3KB/s
378KB/s
208KB/s
2.22KB/s
#1 #2 #3 Cloud
side
Robot
side
18. PEAR LAB Utsunomiya Univ.
Result example:
Architecture Exploration at Model level
2016/10/6 RSP2016@ESWEEK, Pittsburgh 18
Self
Localization
Map
Construction
Map
Visuali-
zation
Camera
Input
30fps
rtabmapcamera
Depth
image
RGB
image
Odom.
Map Data
Odom.
For visualize
Info
Partitioning
ORB Feature
Extraction
Feature
Vector
Added node
Robot side
(53.69ms) (276.69ms
)
(13.76ms)
480KB/s
27.8MB/s
18.5MB/s
9.3KB/s
378KB/s
208KB/s
2.22KB/s
Cloud side
The model can be
directly mapped into
ROS software and ROS-
compliant FPGA
component
19. PEAR LAB Utsunomiya Univ.
SW only Proposed(HW/SW)
Communication
Bandwidth
>370Mbps 6.38Mbps
Transferring
Data
Image
46.4MB/s
Feature Vector
480KB/s
Processing time
at robot side
#1: 0ms
#2: 67.45ms
#3: 357.9ms
13.76ms
Processing time
at cloud side
#1: 357.9ms
#2: 276.69ms
#3: 0ms
330.4ms
Summary : Result of Architecture
Exploration at Model level
2016/10/6 RSP2016@ESWEEK, Pittsburgh 19
(Time and
power can be
reduced more
by using FPGA)
20. PEAR LAB Utsunomiya Univ.
•We have proposed Architecture Exploration at model-
level and runtime-level for Intelligent Robot System
using ROS-compliant FPGA component.
•We learned from the case study of Visual SLAM that:
• Off-loading of some part in Visual SLAM processing onto server
outside the robots has potential to improve the processing
performance or to reduce power consumption at robot side
•Future work
• Power exploration at model-level, exploration at runtime-level and
implementation of the distributed architecture of Visual SLAM
Conclusion
2016/10/6 RSP2016@ESWEEK, Pittsburgh 20