This project has received funding from
the European Union’s Horizon 20 20
research and innovation programme
under grant agreement No 688403
www.tulipp.eu
TULIPP
Towards Ubiquitous Low-power Image
Processing Platforms
I. Tchouchenkov
15.05.2018
Partners
• Thales : coordinator and Medical use case
• Sundance : hardware
• Hipperos : Operating system
• Synective Labs : ADAS use case
• Efficient Innovation : Management
• Fraunhofer IOSB : UAV use case
• Ruhr Universität Bochum : FPGA tools
• NTNU : performance tools
Main objectives:
• Objective 1: Define a reference platform for low-
power image processing applications
• Objective 2: Instantiate the reference platform
through use cases applications
• Objective 3: Demonstrate and plan improvements
of defined key performance indicators
• Objective 4: Start-up and manage an ecosystem of
stakeholder to extend image processing norms
Project objectives
Towards Ubiquitous Low-Power Image Processing Platforms
Component tools
Operating System
Processor
Toolchain
Reference Platform
Memory
IO
Processor
WPs
WP7: Management, Coordination
LABEL : Marketing, Ecosystem and Pre-normalisation
WP6: IP protection, Dissemination, Communication, Advisory Board
and Exploitation preparation
WP1: Reference platform definition
(Interfaces & implementation Rules)
Instantiations
WP2:
Hardware
WP4:
Programming
Toolchain
WP3:
Runtime, API,
Libraries & OS
feedback WP5 : Usecases description
and Integration and platform
validation
Advisory Board and EcoSystem
• Guaranty
• Interconnectivity
• Faster time-to-market
• open standards
Tasks and WP2 objectives
Objectives:
1. The reference platform instantiation [based on the recommendations given in WP1 and coordinated with
WP3-WP5]
2. A holistic iterative development and optimization concept for low-power high-performance
image processing boards [taking into account results of WP3 - WP5].
Tasks:
T2.1 Components and parameters [M03 - M15] Leader: FHG / Participants: RUB, SUN, THL, NTNU, SYN
T2.2 Internal Components Interfaces [M06 to M12] Leader: FHG / Participants: SUN, THL, NTNU, SYN
T2.3 Development of Tulipp Platform [M12 - M34] Leader: SUN / Participants: FHG, THL, NTNU, SYN
Solution: Hardware Selection as a Project
1. REQUIREMENT ANALYSIS: The first step in selection understands the user’s
requirements within the framework and the environment in which the system is
being installed.
2. SYSTEM SPECIFICATION: The system specification must be clearly defined. These
specification must reflects the actual application to be handled by the system.
3. EVALUATION AND VALIDATION: The evaluation phase ranks various vendor
proposals and determines the one best suited to the user’s requirements. It looks
into items such as price, availability and technical support.
4. VENDER SELECTION: This step determines the vender with the best combination
of reputation, reliability, service record, training, delivery time, lease/finance
terms.
5. POST INSTALLATION REVIEW: The step checks how the user‘s requirements were
fulfilled.
Analysis of Use Cases
Sizes,
mm
Weight,
g Interfaces Resolution
Power
Consumption
Progr.
Language
Input Output minimal optimal
Medical
Imaging
PCIe 1x 2.0
minimum
1x Gigabit
Ethernet 1024x1024 1344x1344
< 10 W, better
5W
C/C++,
OpenCL
Automotive Camera Link Ethernet… 640x480 1024x512 Few watts
C/C++,
OpenCL,
CUDA
UAV 120 x 120 < 300
2 x Camera
Link
USB,CAN,
Ethernet 376x240 640x480
< 10 W, better
5W
C/C++,
OpenCL,
OpenMP
Input, MBits/sec Output, MBits/sec
Latency,
msecs
minimal preferred minimal preferred
Medical
Imaging 420 (2 bytes/pixel) 870 900 940 < 170
Automotive 222 (3 bytes/pixel) 378
<1 for control
250 for video
<1 for control
400 for video < 150
UAV 7 (1 byte/pixel) 73
<1 for control
8 for video
<1 for control
80 for video
< 100
(optimal 10)
1. REQUIREMENT ANALYSIS (partially)
2. SYSTEM SPECIFICATION (partially)
Results achieved: market view SoCs
Comparison of SoCs potentially suitable for low-power
high-performance image processing
3. EVALUATION AND VALIDATION
Power Performance
Performance
per Watt Interfaces
Release
Year
Movidius (Myriad2) 1,2 W 150 GFlops
12 Lanes MIPI, 3xI2C, SPI, 3xI2S ,GPIO, PWM, USB3.0, 2-
Slot SDIO, 1xUART, 1xGbE, parallel video I/O 2016
Nvidia Tegra K1 11 W 326 GFlops
Input: CSI (4x4x1); USB 2.0/3.0, I2C, 2x DSI, PCIe, GPIOs,
GbE, HDMI 4K 2014
Nvidia Tegra X1 11 W 1024 GFlops
2x DSI, eDP 1.4 / DP 1.2 / HDMI 2.0, UART, SPI, I2C, I2S,
GPIOs, USB 2.0/3.0, PCIe 2.0, GbE 2015
Nvidia Tegra X2 15W 1500 GFlops 100 GFlops/W
2x DSI, eDP 1.4 / 2x DP 1.2 / HDMI 2.0, CAN, UART, SPI,
I2C, I2S, GPIOs, USB 2.0/3.0, PCIe 2.0, GbE 2017
KeyStone II
(66AK2H14) 14 W 198 GFlops
10-GbE, 2xPCIe 2.0, USB 3.0, 3xI2C, 3xSPI, 2xUART,
EMIF16,… 2014
Sitara (AM5728) 6,5 W 10500 DMIPs
2 PRU-ICSS, QSPI, 2xPCIe 2.0, GPIO, USB 3.0, 2xCAN,
1xHDMI out, 2xGbE,… 2015
Snapdragon 820 ? 500 GFlops
USB 3.0,Bluetooth 4.1, 4K Ultra HD, UFS 2.0, eMMC 5.1, SD
3.0 2015
Exynos 8890 ? 265 GFlops UFS 2.0, eMMC 5.1, 4K Ultra HD 2016
Atom X5 (Z8500) 2 W 115 GFlops USB 3.0, 1xPCIe 2.0, HDMI 1.4 2015
Atom X7 (Z8700) 2 W 153 GFlops 3xUSB 3.0, 2xPCIe 2.0, HDMI 1.4 2015
Apollo Lake (N4700) 6 W 230 GFlops 6 x PCIe 2.0, HDMI 1.4b, 6x USB 3.0, 2x SATA 3, I2C, SPI,… 2016
Kalray MPPA2-256 20 W 1500 GFlops 75 GFlops/W 2x 40GbE and PCIe x16 3.0 2016
Adapteva Epiphany-IV 2 W 140 GFlops 70 GFlops/W 2xeLink, 24 GPIO 2013
AMD G-Series (I/J) 8-15 W 564 GFlops
PCIe 3.0 1x4, PCIe 2/3 4x1, 2xUSB3.0, 2xUSB2.0, 2xSATA
2.0/3.0, 2xHDMI 2.0,… 2016
Stratix 10 (GX/SX 400) 12,5 W 1000 GFlops 80 GFlops/W
PCIe 3.0, 2xUSB 2.0, 3xGbE, 2xUART, 4xSPI , 5xI2C, 1x
eMMC 4.5,... 2016
Zynq 7000 1-20 W 60-1560 GFlops 72 GFlops/W 2xUSB2.0, 2xGbE, 2xQSPI, 2xI2C, 2xCAN 2.0, 2xUART,… 2013
Zynq UltraScale+
Up to
24W 2x better 2.4x better
PCIe 2.1, 2xUSB3.0, sATA 3.1, 4xGbE, 2x eMMC 4.51,
2xQSPI, 2xI2C, 2xCAN 2.0,… 2016
Evaluation of FPGA and GPUs characteristics
Source: www.bertendsp.com
3. EVALUATION AND VALIDATION
Comparsion Tegra X2<->Stratix 10<->Zynq UltraScale+
4. VENDER SELECTION
Tegra X2 Stratix 10 SoC (400)
Zynq UltraScale+
MPSoC (EV)
Processing System
Application Processing
Unit
2 "Denver 2” + 4 ARM
Cortex-A57 up to 1.4 Ghz
Quad-core ARM Cortex-A53 up
to 1.5GHz, NEON coprocessor
Quad-core ARM Cortex-A53 up
to 1.5GHz
Real-Time Processing
Unit
- -
Dual-core ARM Cortex-R5 up to
600MHz
Multimedia Processing
GPU Pascal (256 cores,
up to 1122 MHz)
-
GPU ARM Mali-400 (2 cores up
to 667MHz)
Memory Interface LPDDR4, 58.4 GBytes/sec
DDR4, DDR3, LP DDR3, 25.6
GBytes/sec
DDR4, LPDDR4, DDR3,
DDR3L, LPDDR3
High-Speed Peripherals
PCIe 2.0, 2xDSI, eDP 1.4,
HDMI 2.0, CAN, UART,
SPI, I2C, I2S, USB 3.0,
GbE,…
PCIe 3.0, 2xUSB 2.0, 3xGbE,
2xUART, 4xSPI , 5xI2C, 1x
eMMC 4.5,...
PCIe 2.1, 2xUSB3.0, sATA 3.1,
4xGbE, 2x eMMC 4.51, 2xQSPI,
2xI2C, 2xCAN 2.0,…
Programmable
Logic
Max Logic Cells /
System Logic Cells (K)
- 378 Up to 1,143
Technology 16 nm 14 nm 16 nm
Power 15 W 12,5 W 2- 24 W
Results achieved: components selected
Tegra X2 Stratix 10 SoC (400)
Zynq UltraScale+
MPSoC (EV)
Processing System
Application Processing Unit 3 2 2
Real-Time Processing Unit 0 0 2
Multimedia Processing 3 0 1
Memory Interface 3 2 2
High-Speed Peripherals 3 3 3
Programmable Logic
Max Logic Cells /
System Logic Cells (K)
0 2 3
Technology 3 3 3
Power 2 2 2
Scalability 0 0 2
Score: 17 14 20
4. VENDER SELECTION
Selected component: Zynq UltraScale+ MPSoC (EV)
Source: xilinx.com
First instance of the Tulipp Hardware Node
Sundance EMC2-Z7030 (Z7015) with a dual-core ARM-A9 and Kintex-7 FPGA
Advantages:
 PC/104 form factor board
 Integrated 1Gb Ethernet, USB2.0, sATA-2
 PCI Express 2.0 and integrated PCI Express switch
 Infinite number of the boards can be stacked for large I/O solutions
 Expandable with any VITA57.1 FMC I/O Module for more flexibility
 Latest Xilinx SDSoC development environment integrated
 Has an upgrade path to the Zynq UltraScale+
WP5: Unmanned Aerial Vehicle
(UAV) Use Case
• Uses state-of-the art stereo
algorithms (image
correlation)
• Produces a distance image,
i.e. where the image data
shows the distance to each
object
• Performs real-time stereo depth
estimation to do obstacle /
collision avoidance (for an UAV),
i.e. to detect obstacles in
direction of flight
• Based on dual cameras
Implementation of the obstacle
avoidance
Obstacle Stereo camera EMC2 Board
RS232 (-12V +12V)USB 2.0
DJI Matrice
MAX3223
3.3V TTL
Find contours
Histogram
Obstacle avoidance
U-Map
Short-Term-Map
API control
Danke für Ihre
Aufmerksamkeit!
Fragen?

TULIPP overview

  • 1.
    This project hasreceived funding from the European Union’s Horizon 20 20 research and innovation programme under grant agreement No 688403 www.tulipp.eu TULIPP Towards Ubiquitous Low-power Image Processing Platforms I. Tchouchenkov 15.05.2018
  • 2.
    Partners • Thales :coordinator and Medical use case • Sundance : hardware • Hipperos : Operating system • Synective Labs : ADAS use case • Efficient Innovation : Management • Fraunhofer IOSB : UAV use case • Ruhr Universität Bochum : FPGA tools • NTNU : performance tools
  • 3.
    Main objectives: • Objective1: Define a reference platform for low- power image processing applications • Objective 2: Instantiate the reference platform through use cases applications • Objective 3: Demonstrate and plan improvements of defined key performance indicators • Objective 4: Start-up and manage an ecosystem of stakeholder to extend image processing norms
  • 4.
    Project objectives Towards UbiquitousLow-Power Image Processing Platforms Component tools Operating System Processor Toolchain Reference Platform Memory IO Processor
  • 5.
    WPs WP7: Management, Coordination LABEL: Marketing, Ecosystem and Pre-normalisation WP6: IP protection, Dissemination, Communication, Advisory Board and Exploitation preparation WP1: Reference platform definition (Interfaces & implementation Rules) Instantiations WP2: Hardware WP4: Programming Toolchain WP3: Runtime, API, Libraries & OS feedback WP5 : Usecases description and Integration and platform validation
  • 6.
    Advisory Board andEcoSystem • Guaranty • Interconnectivity • Faster time-to-market • open standards
  • 7.
    Tasks and WP2objectives Objectives: 1. The reference platform instantiation [based on the recommendations given in WP1 and coordinated with WP3-WP5] 2. A holistic iterative development and optimization concept for low-power high-performance image processing boards [taking into account results of WP3 - WP5]. Tasks: T2.1 Components and parameters [M03 - M15] Leader: FHG / Participants: RUB, SUN, THL, NTNU, SYN T2.2 Internal Components Interfaces [M06 to M12] Leader: FHG / Participants: SUN, THL, NTNU, SYN T2.3 Development of Tulipp Platform [M12 - M34] Leader: SUN / Participants: FHG, THL, NTNU, SYN
  • 8.
    Solution: Hardware Selectionas a Project 1. REQUIREMENT ANALYSIS: The first step in selection understands the user’s requirements within the framework and the environment in which the system is being installed. 2. SYSTEM SPECIFICATION: The system specification must be clearly defined. These specification must reflects the actual application to be handled by the system. 3. EVALUATION AND VALIDATION: The evaluation phase ranks various vendor proposals and determines the one best suited to the user’s requirements. It looks into items such as price, availability and technical support. 4. VENDER SELECTION: This step determines the vender with the best combination of reputation, reliability, service record, training, delivery time, lease/finance terms. 5. POST INSTALLATION REVIEW: The step checks how the user‘s requirements were fulfilled.
  • 9.
    Analysis of UseCases Sizes, mm Weight, g Interfaces Resolution Power Consumption Progr. Language Input Output minimal optimal Medical Imaging PCIe 1x 2.0 minimum 1x Gigabit Ethernet 1024x1024 1344x1344 < 10 W, better 5W C/C++, OpenCL Automotive Camera Link Ethernet… 640x480 1024x512 Few watts C/C++, OpenCL, CUDA UAV 120 x 120 < 300 2 x Camera Link USB,CAN, Ethernet 376x240 640x480 < 10 W, better 5W C/C++, OpenCL, OpenMP Input, MBits/sec Output, MBits/sec Latency, msecs minimal preferred minimal preferred Medical Imaging 420 (2 bytes/pixel) 870 900 940 < 170 Automotive 222 (3 bytes/pixel) 378 <1 for control 250 for video <1 for control 400 for video < 150 UAV 7 (1 byte/pixel) 73 <1 for control 8 for video <1 for control 80 for video < 100 (optimal 10) 1. REQUIREMENT ANALYSIS (partially) 2. SYSTEM SPECIFICATION (partially)
  • 10.
  • 11.
    Comparison of SoCspotentially suitable for low-power high-performance image processing 3. EVALUATION AND VALIDATION Power Performance Performance per Watt Interfaces Release Year Movidius (Myriad2) 1,2 W 150 GFlops 12 Lanes MIPI, 3xI2C, SPI, 3xI2S ,GPIO, PWM, USB3.0, 2- Slot SDIO, 1xUART, 1xGbE, parallel video I/O 2016 Nvidia Tegra K1 11 W 326 GFlops Input: CSI (4x4x1); USB 2.0/3.0, I2C, 2x DSI, PCIe, GPIOs, GbE, HDMI 4K 2014 Nvidia Tegra X1 11 W 1024 GFlops 2x DSI, eDP 1.4 / DP 1.2 / HDMI 2.0, UART, SPI, I2C, I2S, GPIOs, USB 2.0/3.0, PCIe 2.0, GbE 2015 Nvidia Tegra X2 15W 1500 GFlops 100 GFlops/W 2x DSI, eDP 1.4 / 2x DP 1.2 / HDMI 2.0, CAN, UART, SPI, I2C, I2S, GPIOs, USB 2.0/3.0, PCIe 2.0, GbE 2017 KeyStone II (66AK2H14) 14 W 198 GFlops 10-GbE, 2xPCIe 2.0, USB 3.0, 3xI2C, 3xSPI, 2xUART, EMIF16,… 2014 Sitara (AM5728) 6,5 W 10500 DMIPs 2 PRU-ICSS, QSPI, 2xPCIe 2.0, GPIO, USB 3.0, 2xCAN, 1xHDMI out, 2xGbE,… 2015 Snapdragon 820 ? 500 GFlops USB 3.0,Bluetooth 4.1, 4K Ultra HD, UFS 2.0, eMMC 5.1, SD 3.0 2015 Exynos 8890 ? 265 GFlops UFS 2.0, eMMC 5.1, 4K Ultra HD 2016 Atom X5 (Z8500) 2 W 115 GFlops USB 3.0, 1xPCIe 2.0, HDMI 1.4 2015 Atom X7 (Z8700) 2 W 153 GFlops 3xUSB 3.0, 2xPCIe 2.0, HDMI 1.4 2015 Apollo Lake (N4700) 6 W 230 GFlops 6 x PCIe 2.0, HDMI 1.4b, 6x USB 3.0, 2x SATA 3, I2C, SPI,… 2016 Kalray MPPA2-256 20 W 1500 GFlops 75 GFlops/W 2x 40GbE and PCIe x16 3.0 2016 Adapteva Epiphany-IV 2 W 140 GFlops 70 GFlops/W 2xeLink, 24 GPIO 2013 AMD G-Series (I/J) 8-15 W 564 GFlops PCIe 3.0 1x4, PCIe 2/3 4x1, 2xUSB3.0, 2xUSB2.0, 2xSATA 2.0/3.0, 2xHDMI 2.0,… 2016 Stratix 10 (GX/SX 400) 12,5 W 1000 GFlops 80 GFlops/W PCIe 3.0, 2xUSB 2.0, 3xGbE, 2xUART, 4xSPI , 5xI2C, 1x eMMC 4.5,... 2016 Zynq 7000 1-20 W 60-1560 GFlops 72 GFlops/W 2xUSB2.0, 2xGbE, 2xQSPI, 2xI2C, 2xCAN 2.0, 2xUART,… 2013 Zynq UltraScale+ Up to 24W 2x better 2.4x better PCIe 2.1, 2xUSB3.0, sATA 3.1, 4xGbE, 2x eMMC 4.51, 2xQSPI, 2xI2C, 2xCAN 2.0,… 2016
  • 12.
    Evaluation of FPGAand GPUs characteristics Source: www.bertendsp.com 3. EVALUATION AND VALIDATION
  • 13.
    Comparsion Tegra X2<->Stratix10<->Zynq UltraScale+ 4. VENDER SELECTION Tegra X2 Stratix 10 SoC (400) Zynq UltraScale+ MPSoC (EV) Processing System Application Processing Unit 2 "Denver 2” + 4 ARM Cortex-A57 up to 1.4 Ghz Quad-core ARM Cortex-A53 up to 1.5GHz, NEON coprocessor Quad-core ARM Cortex-A53 up to 1.5GHz Real-Time Processing Unit - - Dual-core ARM Cortex-R5 up to 600MHz Multimedia Processing GPU Pascal (256 cores, up to 1122 MHz) - GPU ARM Mali-400 (2 cores up to 667MHz) Memory Interface LPDDR4, 58.4 GBytes/sec DDR4, DDR3, LP DDR3, 25.6 GBytes/sec DDR4, LPDDR4, DDR3, DDR3L, LPDDR3 High-Speed Peripherals PCIe 2.0, 2xDSI, eDP 1.4, HDMI 2.0, CAN, UART, SPI, I2C, I2S, USB 3.0, GbE,… PCIe 3.0, 2xUSB 2.0, 3xGbE, 2xUART, 4xSPI , 5xI2C, 1x eMMC 4.5,... PCIe 2.1, 2xUSB3.0, sATA 3.1, 4xGbE, 2x eMMC 4.51, 2xQSPI, 2xI2C, 2xCAN 2.0,… Programmable Logic Max Logic Cells / System Logic Cells (K) - 378 Up to 1,143 Technology 16 nm 14 nm 16 nm Power 15 W 12,5 W 2- 24 W
  • 14.
    Results achieved: componentsselected Tegra X2 Stratix 10 SoC (400) Zynq UltraScale+ MPSoC (EV) Processing System Application Processing Unit 3 2 2 Real-Time Processing Unit 0 0 2 Multimedia Processing 3 0 1 Memory Interface 3 2 2 High-Speed Peripherals 3 3 3 Programmable Logic Max Logic Cells / System Logic Cells (K) 0 2 3 Technology 3 3 3 Power 2 2 2 Scalability 0 0 2 Score: 17 14 20 4. VENDER SELECTION
  • 15.
    Selected component: ZynqUltraScale+ MPSoC (EV) Source: xilinx.com
  • 16.
    First instance ofthe Tulipp Hardware Node Sundance EMC2-Z7030 (Z7015) with a dual-core ARM-A9 and Kintex-7 FPGA Advantages:  PC/104 form factor board  Integrated 1Gb Ethernet, USB2.0, sATA-2  PCI Express 2.0 and integrated PCI Express switch  Infinite number of the boards can be stacked for large I/O solutions  Expandable with any VITA57.1 FMC I/O Module for more flexibility  Latest Xilinx SDSoC development environment integrated  Has an upgrade path to the Zynq UltraScale+
  • 17.
    WP5: Unmanned AerialVehicle (UAV) Use Case • Uses state-of-the art stereo algorithms (image correlation) • Produces a distance image, i.e. where the image data shows the distance to each object • Performs real-time stereo depth estimation to do obstacle / collision avoidance (for an UAV), i.e. to detect obstacles in direction of flight • Based on dual cameras
  • 18.
    Implementation of theobstacle avoidance Obstacle Stereo camera EMC2 Board RS232 (-12V +12V)USB 2.0 DJI Matrice MAX3223 3.3V TTL Find contours Histogram Obstacle avoidance U-Map Short-Term-Map API control
  • 19.