Slides presented at the FlexTiles Workshop at FPL'2014.
Presentation #3: FlexTiles DSP Accelerators
FlexTiles is a heterogeneous many-core platform reconfigurable at run-time developed within an FP7 project.
1. www.flextiles.eu
FlexTiles
Workshop at FPL’2014 conference: FlexTiles FP7 project Low-Power DSP Accelerator Embedded in a Heterogeneous Many-Core Architecture
Marc MORGAN
CSEM – Swiss Center for Electronics and Microtechnology
2. 1 /
The information contained in this document and any attachments are the property of FlexTiles consortium. You are hereby notified that any review, dissemination, distribution, copying or otherwise use of this document must be done in accordance with the CA of the project (TRT/DJ/624412785.2011). Template version 1.0
CSEM overview on a single slide
•private company, founded in the 1980’s, not for profit
•approx. 450 employees on 5 sites in Switzerland (HQ in Neuchatel) and a site Brazil
•5 research programs:
1.ultra-low power integrated systems (SoC, Vision, Wireless)
2.systems engineering (med tech, instrumentation, automation)
3.MEMS
4.surface engineering (nano, bio, printable electronics)
5.photovoltaic
•approx. 70 MCHF annual budget
•over 20 start-ups and spin-offs since 1995
3. 2 /
The information contained in this document and any attachments are the property of FlexTiles consortium. You are hereby notified that any review, dissemination, distribution, copying or otherwise use of this document must be done in accordance with the CA of the project (TRT/DJ/624412785.2011). Template version 1.0
Many-core architecture: GPPs + accelerators
An array of general purpose processors (GPP)
Connected via a Network-on-Chip (NoC)
Complemented with accelerators to optimize speed and power:
DSP processors or specialized logic implemented in embedded-FPGA
Plus memory nodes and I/O
4. 3 /
The information contained in this document and any attachments are the property of FlexTiles consortium. You are hereby notified that any review, dissemination, distribution, copying or otherwise use of this document must be done in accordance with the CA of the project (TRT/DJ/624412785.2011). Template version 1.0
Many-core architecture: GPPs + accelerators (cont’d)
Several IPs are available for the building blocks
both in the consortium and on the market
architectural choices attempt to retain genericity of the platform CSEM provides an ultra-low power DSP processor for the DSP accelerator It plugs into a generic accelerator interface (AI)
5. 4 /
The information contained in this document and any attachments are the property of FlexTiles consortium. You are hereby notified that any review, dissemination, distribution, copying or otherwise use of this document must be done in accordance with the CA of the project (TRT/DJ/624412785.2011). Template version 1.0
Accelerator interface (AI)
Interfaces the NoC’s NI to the accelerator by providing services:
programming, control/status, data in, data out, debug
DMA access, word FIFOs, notification
6. 5 /
The information contained in this document and any attachments are the property of FlexTiles consortium. You are hereby notified that any review, dissemination, distribution, copying or otherwise use of this document must be done in accordance with the CA of the project (TRT/DJ/624412785.2011). Template version 1.0
DSP accelerator architecture
Choices for the DSP accelerator avoid DSP specific features
the DSP will not run an OS or kernel
the DSP will not use (or at least not require) interruptions
Note: CSEM’s icyflex4 ULP DSP could support both of the above Implement a FIFO manager to handle input and output tokens from/to the accelerator interface (AI) Implement debug and tracing facilities
Debug: JTAG 1149.1 TAP
Tracing: programmable tracing unit
7. 6 /
The information contained in this document and any attachments are the property of FlexTiles consortium. You are hereby notified that any review, dissemination, distribution, copying or otherwise use of this document must be done in accordance with the CA of the project (TRT/DJ/624412785.2011). Template version 1.0
DSP accelerator architecture (cont’d)
8. 7 /
The information contained in this document and any attachments are the property of FlexTiles consortium. You are hereby notified that any review, dissemination, distribution, copying or otherwise use of this document must be done in accordance with the CA of the project (TRT/DJ/624412785.2011). Template version 1.0
Management of the DSP accelerator
Each accelerator is managed by software running on GPPs
virtualization manager: attribution of the accelerator
resource manager: control of the accelerator These managers are in charge of:
transfer of the application (ELF) to the accelerator
signaling the accelerator when to start and when to stop
recovering statistics on usage of the accelerator to optimize the execution of the application on the many-core platform The tracing unit can be managed from the processor or from the JTAG interface
9. 8 /
The information contained in this document and any attachments are the property of FlexTiles consortium. You are hereby notified that any review, dissemination, distribution, copying or otherwise use of this document must be done in accordance with the CA of the project (TRT/DJ/624412785.2011). Template version 1.0
Application (C code)
C to SpearDE representation Conversion (Thales)
Data parallelisation Mapping (Thales)
Graphic input (manual) + C kernels
Streaming optimisation
(ACE)
Compilation & Link
(ACE)
architecture representation
Master Cores
GPP
Slave cores eFPGA, DSP
Library of IPs
Tool flow and Model of Computation
Binaries
Acc compiler or C2VHDL tools
(CSEM / UR1 / RUB)
Masters control slaves
Architecture
configuration
GUI (KIT)
10. 9 /
The information contained in this document and any attachments are the property of FlexTiles consortium. You are hereby notified that any review, dissemination, distribution, copying or otherwise use of this document must be done in accordance with the CA of the project (TRT/DJ/624412785.2011). Template version 1.0
icyflex software development kit
GNU C compiler (gcc) v 4.6.3
icyflex instruction parallelism supported by latest releases of gcc
libc and libm from RedHat’s NewLib
software implementation of IEEE floating-point standard GNU assembler / linker (binutils), v 2.20
BFD / ELF32 object file format
Binary, SREC, IHEX memory image file formats GNU debugger (gdb), v 6.7.1
Mode 1: instruction set simulator of the icyflex core
Mode 2: On-Chip Debug (OCD) through a JTAG interface icyflex instruction set simulator (ISS), written in C++
Phase-accurate, pipelined
Wrappers to SystemC, VHDL (Modelsim), Matlab/Simulink Eclipse integrated development environment, v Helios
CDT C/C++ IDE plug-in
icyflex plug-in
.c
.o
.exe
.log
gcc
ld
gdb
gdb
11. 10 /
The information contained in this document and any attachments are the property of FlexTiles consortium. You are hereby notified that any review, dissemination, distribution, copying or otherwise use of this document must be done in accordance with the CA of the project (TRT/DJ/624412785.2011). Template version 1.0
icyflex family of ultra-low power processors
icyflex2
Control
Computing
Power
DSP
icyflex1
icyflex4
1 MUL 2 MAC 4 MAC … 36 MAC
Application
6 μW/MHz
25 μW/MHz
10-150 μW/MHz
12 MAC
power indicated for TSMC 65 nm CMOS
12. 11 /
The information contained in this document and any attachments are the property of FlexTiles consortium. You are hereby notified that any review, dissemination, distribution, copying or otherwise use of this document must be done in accordance with the CA of the project (TRT/DJ/624412785.2011). Template version 1.0
icyflex2 vs icyflex4
Feature
icyflex2
icyflex4 VPS=2
Optimized for
Control
DSP
P, X, Y memory buses,
ISA, HW loops, saturation, …
Instruction word [bits]
32 (1 or 2 sub)
64 (1, 2 or 3 sub)
Memory access [bits]
8, 16 or 32
2x (8, 16, 32, 64, 128)
Data processing [bits]
16 or 32, trunc
2x (16 or 32 or 64), full
Single Instr. Multiple Data (SIMD)
No
Yes, up to 8 MAC
Instruction set is reconfigurable
on the fly
No
Yes
Software Development Kit (SDK)
GNU-based tool suite (gcc, gdb) + cycle- accurate instruction set simulator (ISS)
Hardware Devt Kit (HDK)
FPGA-based, customizable
VPS = Vector Processing Slices in the Vector Processing Unit of the DSP
13. 12 /
The information contained in this document and any attachments are the property of FlexTiles consortium. You are hereby notified that any review, dissemination, distribution, copying or otherwise use of this document must be done in accordance with the CA of the project (TRT/DJ/624412785.2011). Template version 1.0
blank instructions
configured at run-time
icyflex: reconfigurable instructions and addressing modes
Instruction set
ADD
MUL
SHR
MAC
JUMP
configurable
configurable
SHIFT
MUX
ALU
ACC
ACC
SHIFT
MUX
ALU
ACC
ACC
instruction decoding
cycle N: config MOP
cycle N+1: use MOP
14. 13 /
The information contained in this document and any attachments are the property of FlexTiles consortium. You are hereby notified that any review, dissemination, distribution, copying or otherwise use of this document must be done in accordance with the CA of the project (TRT/DJ/624412785.2011). Template version 1.0
DSP in FlexTiles emulators
Emulator 1 (software):
Using Open Virtual Platform (OVP)
Not cycle accurate
The icyflex4 DSP is emulated by a GPP running at a higher frequency Emulator 2 (hardware):
Using an FPGA board with two Xilinx Virtex6 FPGAs
Uses a DFF version of the DSP accelerator
15. 14 /
The information contained in this document and any attachments are the property of FlexTiles consortium. You are hereby notified that any review, dissemination, distribution, copying or otherwise use of this document must be done in accordance with the CA of the project (TRT/DJ/624412785.2011). Template version 1.0
Exploitation of FlexTiles results at CSEM
CSEM specializes in low power solutions A well-balanced multi-processor design can optimize energy consumption by reducing voltage and frequency For multi-core: we offer CSEM solutions For many-core: CSEM collaborates with 1 or more of our partners
including e.g. a follow up project to produce FlexTiles chips
16. 15 /
The information contained in this document and any attachments are the property of FlexTiles consortium. You are hereby notified that any review, dissemination, distribution, copying or otherwise use of this document must be done in accordance with the CA of the project (TRT/DJ/624412785.2011). Template version 1.0
FlexTiles FP7 project
For more information regarding the FlexTiles project, visit:
http://www.flextiles.eu
Please take 5 minutes to fill out the survey
on the project web site under the Contact menu
The FlexTiles project is funded in part by FP7, the seventh framework programme of the European Commission.
17. www.flextiles.eu
FlexTiles
Thank you for your attention! For more information: http://www.csem.ch Questions? mailto:marc.morgan@csem.ch